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I'lsit our website at http://ww\%.hp-com/hpj/journaLhtiiil 

When we first put the Hewlett-Packard Journal on the Web in May 1934, our online issues 
were offered in PostScript "format and designed for the Mosaic browser. Today, we have a 
webmastec our online issues live in PDF (Portable Document Format} files, we have a nice 
home page (see beJowh and we have many more features that take advantage of the latest 
Web technology. 

This month we are adding three new features to our site, and we thought it would be helpful to 
describe the structure and content of our site. 
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• Every two months Current Issue is 
updated with the latest issue. You can 
find the latest issue in Current Issue 
before it arrives in your mailbox. 

• We have issues back to February 
1994 in Fast Issues. Since 1996 we 
have been including Jinks to other HP 
sites containing relevant information 
about the products, research areas, 
or processes described in each 
issue. 

• To search for articles by title, subject, 
product, or author, use Index, which 
contains information about articles 
going back to the first issue of the 
Journal in 1949. Once you find an 

article, there is an Order button that tells you how to order the article or issue of interest 

• Non-HP individuals in the U.S, can request a subscription to the Journal in Subscription 
Information. Guidelines for international subscriptions are also contained in this section. 

• To see articles that are not yet available in print, look in Previews 

• About The Journal contains information on legal disclaimers and submission to the Journal. 

• If you want to be notified by e-mail when a new Journal issue is available, fill in the form in 
E-Mail Notification. 

This month we will celebrate the 25th anniversary of the HP 35 handheld calculator by offering 
online the originalJournal issue, published in June 1972, which features that successful 
technology. 

Next month we plan to offer a new look to a previously published article titled, "Measuring 
Parasitic CapacitancB and Inductance Using roff,"by David J, Dasher. Using animation, 
several illustrations featuring the propagation of waveforms will be recreated. Whenever 
possible, we hope to use current Web tools to help describe complex technologies. 
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In this Issue 

]n the last couple of years we have had the opportunity to chronicle the evolution 

of destgn and development efforts associated with HP's newest microprocessors 
based on the PA-RISC architecture. The HP PA BOOO and PA B200 microprocessors 
a re th e latest e ntr Se s i n this c a nti n uin g e vo I utio n. Th e H P PA 8000 is th e f i rst H P 
processor to implement PA RISC 2.0 and the first capable of 64-bit operation. 
Among the features included in the HP PA 8000 are four-way superscalar proces- 
sors and mechanisms for out-of-order execution, which maximize instruction- 
level parallelism. The article on page 8 provides a brief overview PA RISC 2.0 
and describes the key architectural features, implementation details, and system 
performance attributes of these new microprocessors. 

Like all processor designs, design for the HP PA BODO microprocessor involved a series of trade-offs 
between die area, complexity, performance, speed, power use, and design time. The article on page 16 
discusses these trade-offs and the desi'gn methodologies used for the HP PA &00D processor 

Because the advanced-microarchitecture PA 8000 microprocessor has so many new features, func- 
tional verification to identify defects that might cause the microprocessor to deviate from its specified 
behavior was quite a challenge. The article on page 22 describes the process and the tools involved in 
functional verification for the HP PA 8000 micraprocessor. 

Once it is verified that a processor will perform according to its specifications, the nesastep is to char- 
acterize its behavior when it is pushed beyond its normal operating conditions. This process is called 
electrical verification, and its use for the HP PA 8000 is described on page 32. The article describes how 
shmoo plots are used to help analyze the results of varying different parameters, such as voltage and 
temperature, and the debugging effort that follows the discovery of an anomaly during shmoo testing. 
The layout of the interconnect metal for the HP PA 8000 required some new block routing technologies. 
These technologies are embodied in a tool called PA_Houte, which is described on page 40. 

Telephone servicetodayis more than just the transport of speech information some distance over tele- 
phone lines. Advancements in communications technology and deregulation in the tetecommunications 
industry have meant the presence of more service providers competing to offer a wider range of ser- 
vices other than just voice transport, As a result of all these changes telephone networks have to be 
more " i ntelli g ent" tha n th ey we re in the pa st. Th e a rti c les sta rti ng on page 46 d es c ri be th e HP p en Ca II 
product which is a collection of computer-based telecommunications platforms designed to offer a 
foundation for telephony services based on intelligent networks. The advanced telephony services offered 
today are carried on a separate signaling network from the voice transmission. The article on page 58 
describes the HP OpenCall SS7 platform, which allows customers to build signaling applications con- 
nected to the SS7 (Signaling System if]) signaling network. System reliability is something that customers 
connected to large-scale networks take for granted. The article on page 65 discusses active/standby 
feature provided in HP OpenCall for achieving fault tolerance and high availability. 

Because modern chemical analysis laboratories are so packed with instrumentation and other parapher- 
nalia, an instrument that provides some space economy is a big plus. The article on page 72 describes 
the first benchtop inductively coupled plasma mass spectrometer, the HP 4500. This instrument is one 
fifth the size of previous models and is small and light enough to be installed on an existing bench. The 

HP 4500 has a new type of optics system which allows the instrument to perform analysis down to the 
sutananogram-per-liier or parts-per-trillion (ppt) level. The application areas for the HP 4500 include the 
semiconductor industry, environmental studies, laboratory research, and plant quality control. 

Another essential aspect of a chemical analysis laboratory is the collection of data. With the array of 
instruments creating data and the requirements of many regulatory agencies, data coliection in labora- 
tories has become quite critical Fortunately, many of today's laboratory instruments are automated and 
connected to computer systems, making data collection a little easier. The problem is how to organize 
and store this data. The article on page 80 describes an object database management system that is 
used in the HP ChemStudy product for archiving and retrieving large amounts of complex historical 
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laboratorv data. Tfie articfe describes how historicaf data ts man aged and the mechanisms provided rn 
the obtect DBMS for martagif^g this data. 

One of the features of Asynchronous Transfer fWode {ATMI network technology is ifiat it can satisfy the 
quality-of'Servrce needs of many different types of network traffic. To provide this level of service, the 
ATM network must avoid network congestien, which causes unacceptable delays and data loss- Policmg 
the network is one of the key mechanisms used by ATM to avoid congestion. Policmg is responsible for 
monitoring the network to find potentiai congestion connections, if such a connection is found, policmg 
can discard traffic from that connection, Given the importance of policing, it is essential that the equip- 
ment responsible for doing the policmg be thoroughly tested. The HP E4223A (page 90), is an application 
that is designed to test policing implementations in ATM switches before the switches are deployed for 
commercial service. The article describes network policing and explains how the HP E4223A works to 
test policing in ATM switches. 

The articles starting on page 96, are the last papers we have from HP's Design Technology Conference 
of 1996 The first paper explores the concept of using MOSFET scaling parameters, such as channel 
length and gate oxide thickness, to extrapolate scaling parameters for future MOSFET devices. The 
paper on page 101 discusses using clock dithering as an on-chip technique to reduce EMI. The paper 
surveys information from organizations inside and outside HP that have used clock dithering and fre- 
quency modulation as an EMI reduction technique. The next paper jpagel07| describes a project in 
which a third-party microprocessor design was ported via its hardware description language (HDL) 
specification instead of the traditional artwork port This approach has the advantages of allowing the 
processor to be optimized for HP's design process. The paper on page 114 describes circuit design tech- 
niques and design trade-offs that were employed to design a 3V operational amplifier in the HP CM0S14 
process. The last paper (page 121) analyzes the affects of lids on heat transfer in flip-chip packages. 
The results from this analysis showed that although a lidless design shows better performance, more 
research is needed. 

Ci. Leath 
Managing Editor 



Cover 

The cover shows the four- way superscalar HP PA 8GD0 microprocessor. 



What's Ahead 

Since there will not be an October issue of the HP Journal, December 1997 is the next publication date. 
The December issue will feature a new design for the HP Journal and 12 articles on a very timely topic: 
high-speed network communications. The first two articles wilt discuss the future of this technology and 
its impact on society. The remaining articles will focus on HP's R&D efforts in this area, particularly fiber 
optics. The February 1998 issue will feature six more high-speed network communications articles. 
These articles will focus on wireless communications. 
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Four- Way Superscalar PA-RISC 
Processors 



The HP PA 8000 and PA 8200 PA^RISC CPUs feature an aggressive 
four-way superscalar implementation, speculative execution, end 

on-the-fly instruction reordering- 

by Anne E Scatt, Kevin R Burkhart, Ash ok Kumar, Richard M. Blumbergj 
and Gregory L. Ran son 



llie HP PA HOm ma PA S:^(}U PA-RISt ' CPVfi art" the first 
inrpleiBenUiiions of a jiew generation of niicroproces.sore 
from HtnvleH-P^K'kiird, T]w PA SOOfJ^'^^ is anK>iig the worlds 
most poweifiil aiui advaiued iTiicrojirocesi^ors. anti a! !he 
time of iiitn)durlioi] in Janiiao^ \Wi>, die undisputed perfor- 
niance leatler. The PA 8200;' introilured in June 1997. con- 
tinues this perfonnance leadei^hip wiih higher frequency, 
larger caches, anci several ndiercnluiuceuienis. Boih pro- 
ceasoi^ feat lue an aggressive ftjur-way sui)erscalai iinple- 
nientation, con^bining sjxi^eul alive execution with oii-the-fly 
instruction reoniering. Tins |)aper discusses the olijcctives 
for tJYe design of these protessors, sraive of tin- key architec- 
t Ltr al feat ii ri*s , ini \ 3 Ic n ic nt tit in n ( i et ai 1 s , an ( I sy st c ni j i e rf or- 
m an c e . TIi i^ o\ j eratit jii t)f (ii e in ,s 1 1 v/c/ iuft } vu i d e / b li (fc t ' 
(IRB)/' which prowdes out-of^rder ext>cution capability, 
will alsij be (icscri!)cd, 

PA 8000 Design Objectives 

Tlie piiniary <lesign nt>]ective for the PA 800(1 was to obtain 
hKliistry-leading i:>erforniaiice on a broad raiige of real- w^o rid 
applications. To sustain high jierfoniiance on lai j*e aiiidica- 
tions, n(»i just on bencliniarks, wc tit^signed large, extcnjal 
primary caclies with llie ability to hitie memory latenty in 
hardware. We alscj rh<Lse in in^plenu*nt dynamic instruclirai 
reordering in hardware to maximize I he instruction-level 
pjii^^illt^fisin ri\;M]al>]r t(» the exet iiliuit units. Another goal 
WtLs to provide full su])pon for (>4 l>jt a|)i)liratinns. Ttie pro- 
cessor implements llie tiew PA-RISC 2.0 architecture, wliich 
is a hinaiy com|iatible extension of tlie previous PA-RISf 
ajcbitecture. All prp\1ous code will i^xccnte withiait recojn- 
j)ilati()n or translation. Tlie |)i'ocessor also jirr Aides gku^ess 
suppon for u]) ttj four- way multiprocessing \ ia a liigli-banci- 
v^idtli Runw^ay system bus.'^ Tlie Rujiway bus is a 76S-Mbyte/s 
split-transaction bus that allows each processor to have 
several outstanding meinor>' requests. 

PA-RISC 2.0 Bnhaiuements 

Tlie new PA-RISC 2.0 iu-cliitectrue incon>orates a number of 
advanced miiroarchitectnral enhaneemenls. Mtjst of Ihe 
extensiou-s involve suppt id for(i4dnl coniputing. Imeger 
registers and functional units, including the shift/merge 
units, have been widen*»d to (i4 bits. Flat \irtiial addressing 
up to 64 bits is supported. a.s are pliysirid arldresses greaier 
than :]2 Ints (40 biis were iinijlememiul on the PA80O0). A 
new mode bil hits been implemented that governs adckess 
formation, creating increased Ilexibilitj'. lit 32-bit addressing 



mode, it is still possible to take ad\ antage of (i4'bit compnte 
instructions for faster throughput. In 64-bil addressing 
mode, 32-bit inst met ions ant I conditions are si ill available 
ff>r backwards comi)atit>ility. 

Other extensiojis help optJJiyze performance in the areas 
of \ii1nal nienK.>ry au<l (*arhe management, branching, and 
lloaling-point (.>peralions. These uiciudc fast TLB (transla- 
tion lookasi(ie buffer) insert iastructions, load and sttire 
instruct inns vvJtli Pi-liit displaccincnl. nu'uior^^ lirefetch 
instruct ions, suppon for variable-sized t*ag<^s, halt^-worfi 
instmctioiiR for multimedia supijort, branches widi 224)11 
tlispiacements and sliori pointers, branch pretliction liinthig, 
lloat ing-point inultiply-aeciimulate instructions, float i rig- 
point multiple conq^arc result bits, and other carefully 
selected features. 

Hardware Design 

"11 ii' PA SUOU features a completely redesigned core that does 
not leverage ;;my cin uitiy from |n(^\nous-gt^ueratinn IIP pin- 
cessoi^. Tliis break from previous CPUs allowed us lo in- 
clude new microarchitectural features we deemed necessary 
for higher performance- Fig, 1 is a l\iii<Miunal Idork diagram 
of the tiro ccssor showing the basic runlrol and data palhs. 

The most notable ieatinx^ of the cliiii, Illustrated in ihe cenlri' 
nf the diagram, is the indusliy's largesi instruction reorder 
buffer of 5tj entries, wliieh senes as llie central control luilt. 
Tins block supports full register renaming for all instmctions 
in the buffer, and traeks interdeiJendenries lielween inslnic- 
tions to Jiilow flata How exci ution through the entire window, 

Tlie PA 8000 features a peak execution ritte ol' four instnic- 
tions per cycle, made jiossible by a large eom|>k^mcnt o( 
compntational units, located on the left side of the diagram. 
For integer operation, two 64-bit integer ALUs and two 
644]it shiftymergc units are included. All integer liiiu tional 
I Diits have a si ngle-cy c le 1 at ei i cy. Fc ) r fl oa t i n g] joi u i ai > | ? hca- 
tions. dual noating-poinf multiply and actautndate (F'MAC) 
imit^ and dual divitie/square root units are int luded, Tlie 
FMAC imits are optimized for perlbniiing the very common 
operation A tunes B ])lus ('. By fusing an add to a multiijly, 
each FilAC can execute twfi lloat ing-prvint oiHMiilions in .just 
tlrree cycles. In adt^titm to prtniding knv latency for iloating- 
point operations, the FMAC^ uniLs lire fully t>ipelined so that 
the petik floating-pouit throughput of tbt^ PA 8000 is four 
floating-point opemtlons per cycle. The two divicie/sciuaie 
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FMAC - Floating -Pomi Multiply and Accumulate 
ALU = Arithmetic Logic Unit 

Fig. 1. L^unctkinal hletk diagnmi of thp IIP PA SiW\ [H'ocpssor. 

root units are not pipelined, but other floating-point opera- 
tions can be executed on the FMAC units while the divide/ 
square root units aie busy. A single-precision di\ide or square 
root operation requires 17 cycles, while double precision 
requires Si cycles* 

Ha\nng such a large array of computation tinits would be 
pohitless if those unite could nor: be supplied with enotigli 
data upon which to operate. To this end. the PA 8(100 IncoiTio- 
rales tw-o complete load/store pipes, ittchiding two address 
adders, a 9fi-entry dual -potted TLB, ;uid a duaJ-poileri cache. 
The right sirie of Fig. 1 shows tJie dual ioadystore vmits and 
the memory system interface. The synmietiy of dual fimc- 
tionai units throughout the processor allows a number of 
suTiplifications m the data paths, the control logic, and signal 
rouluig. In effect, this duality provides for separate even and 
odd machines. 

As pipelines get deeper and the p^irallelisni of a processor 
increases, instruction fetc^h baiidwidth iuid branch predlctiott 
become increasingly important. To increase fetch bandwidth 
and mitigate the effed of pipeline stalls for branches pre- 
dicn-d to b(^ laken, I he PA 8000 incorporates a 32-entry 
bmneh fargcl address caciie, or BTAG, This unit is a fully 



associative structure that associates tire address of a branch 
instruction with the address of its target . Wlieuever a branch 
pretiicted to be taken is enctnmtered in the instruction 
stream, an entr>^ is created m the BTAC for that branch. 
The next time the fetch itnit fetches from Lhe address of the 
braiich, the HTAC' signals a hit ajid supphes the adckess of 
iJie bratich target. ITie fetch unit can then immediately fetch 
the target of the branch without incurring any penalty, 
resulting In a 55erf>-state taken branch penalty for branches 
tfiat hit in lhe BTAC. In an effort to improve the hit rate, only 
branches predicted to be taken are kept in the BTAC. If a 
branch hits m riie BTAC but is predicted not to be taken, the 
entry is deleted. 

To reduce the nmttber of nus[jredicted branches, the PA 8000 
implements two modes of branch ptediction: fh/namic nuxle 
and siQtiv mmU'. Kac h TLB enti>^ has a bit to indicate which 
prediction mode to use. Tf iiis^ the mode is selectable on a 
pagr-by-page basis. In dynamic prediction mode, a 256-entry 
bmnvh hisfonj iahh\ or BHT. is coasulted. The BHT stores 
the results of the last iluve iterations of each briuich {either 
taken or not tiiken), and the instniction fetch unit predicts 
that the outcome of a given lirarurh will I.h* the same as the 
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ms^iority of the last three* oiiffooips. hi static prefljrtion niddr. 
the PA 8000 predk:T,s must cojuliiifjiial forwaid braiitJies to 
be uiitaken, mii\ most, conditional backward branches to be 
t^iken. For the common comijarc-tmd-b ranch insi ruction. Ihc 
PA-HISC 2,0 Lire hi tret lire defines a tiranch j^rediriiuji Ijjt thut 
imljcafcs whether this normal prediction conveiuion should 
be followed or whetlier the opposite <'OJivent,ioh shtjiUd be 
used. C'onipilers using either heunsll^:' methods or profile- 
based optimization ain use siarjc prediction iu()de 1o coni- 
niiuiicate branch probabilities elteciively tti I lie haixiwaie. 

Cache Design 

The TA mm reatiires Imge, single-level, off-chi]j, direct- 
map|>e<l iiLstnicliou anci data caches. Both caches su[>poi1 con- 
HgiUtitlf Jiis of up to tour megabytes using industiy-standant 
syncJironous SRAMs> TWo complete copies of tlie data rarlie 
tags aie provided so diat tw^o mdependent accesses <^an l>e 
acconnnodated cirul need not be \o liie same carlie line. 

Why did we design the processor without on-chip caches? 
The nuiin rt^ason is [jerfonnaoce. Competing designs incoriio- 
rale snudl (jn-cliip caches to enable higlier clock frequencies. 
Small on-chip caches support benchmark perft>rmance but 
fade on large apitiications, so we felt we could make belter 
use of the die area, 'flie sophisticated IRB allows us to hide 
Ihe effects of a pi|>elined two-state cache latency. In fact, 
our simulations demoostraied only a 5'Xt perfbmiance inv 
pro vcit lent if the cat:he were on-chij) and had a single-cyde 
iKiency. Tlie flat cache hier'an hy also eliminates die design 
complexity assoc i at ed witli a 1 w o -le ve I cac h e design . 

Chip Statistics 

Th(> PA 8(101) is fabricated in IIP's O.S-micrometer, :l3-volf 
CMOS process. Altliou^h (he drawn geometries are oot ver>' 
aggressive, we still <:)btajit a respectable 0.28-^tn effective 
channel length (Lf^ff). In addition, extensive investment was 
m^e in 1!ie design process to ensure that both layout and 
0reuits would scale easily inltj more advanced technologies 
with sniiiller geometries. Tliere are live metal layers: two for 
tight ijjtc'ii rouliitjE? ^md local interconnect, (wo for low-RC 
global routing, and a final layer foi' clock and i>owTr supply 
routing. 

The i>rocessor is designed with a tJiree-levei clock iietw^ork, 
organized ;is a modified H-tree (see article, piXge 16). The 
clock sync signals seive as primaiy inimrs. Tliey cU'e re- 
ceived Ijy a central bid'fer and chiven to twelve secondarv^ 
clock buffers located in strategic spots around the chip. 
These i>ulfers t Ir^ii diive the clock to the major circttit areas, 
where it is received by rlark gafni^ featuring liigh gain and 
a vei7 slioit iiipul'lo-outpul delay. There aie approximately 
7.000 of these gaters, which have the ability to generate 
many flavors of the clock: two-phase o\ erlapping or non- 
overlapping, inveHing or n on in veiling. Qualified or Jinnquali- 
fied. The Qujdifi cation of clocks is useful for synchronous 
register sets and dumps, as well as for powering down sec- 
tions of h>gic when not in use. Extensive sinmlalion and 
tuning of tl\e clock network were done to minituize clock 
skew and iuipro\'c edge rates. The luial clock skew for this 
design was simulated to be no greater Uiaii 170 ps between 
any tw^o points on the die, 

lender nominal operatijig conditions of room temperature 
and ^.i3-vo!t power supplies, the ctiiiJ is capable of miming at 




Fig. 2. PA mm (VV w]Xb tnajor aroas Uihdcd. 

frequencies up to 250 MHz. Ahhough we cannot guarantee 
processor t>erformance based on results oldainetl iiuiler 
ideal conditions, there appears to l)e an ofjpoilunity for 
greater frequency eidiancement. The die measin-es 17.(i8 mm 
by 19.1 muT and contains 3.8 million transistors. Approxi- 
mately 7^3% ol' (he chip is either full-custom or semicustom. 
A {> holograph of the die with all ma,jor areas labeled is 
shown i]i Fig. 2. Again, the IRB is in the center of the chip, 
providing convenient access to ail the functional units. Tlie 
integer flata padi is on the lefi side of the chip, wliile the 
right side contains (lie fioating-point data path. 

By using flip-chip packaging technology, we were able to 
supi>ort a veiy huge number of I/O signals — 704 in all. In 
atldition to the I/O signals, 1,200 power mid ground solder 
biuuiis are connect etl to the l,()B5-pin package \ia a lantj 
grid array. There are fewer pins than die total of the lyOs 
and bunqis beraase each i)ower and giound pin c;m l)e c^on- 
nected to multiple bumps. A picl ure of the packaged p^ut is 
shown in Fig. 3. The chifj is flipped onto the ceramic cmrier 
using solder bump interconnect, and the carrier is mounted 




Fig. 3. Fnr kfiged PA 8()lMJ t:Pti. 
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on a conveiif ional piintied tirtoit board. This packaging hai> 
several advantages. Tlie wide off-rMp caches are niade |K)s- 
sible by the high-pm-coiini caj>abilit\'. The abiliU' to place 
VO signals aiiv^vhere <m Oie die improves area utilization 
4ind retluccs on-chip RC delays. Finally, the low inductance 
of the signal and power supply paths reduces noise and 
profia^lion deia\^. 

Performance 

Ai 180 MHz with one megabyte of ins Lnic lion ca*:he and 
one megabyte of data cache, the HP PA 8000 deli"\ers over 
1 1 .8 Spc^'lnt95 and greater tlian 20.2 SpecFP95, making tt 
ihe world's fastest processor at the time of introduction. A 
four- way multiprocessor system has also produced 14, 7^351. fjvj 
TpniC ($I'J2.2"i/TpmC), where TpmC is an industn,' -standard 
bcnciimark for online transaction processing. That system 
c( J n figuration was made available in June 1996. 

Enabling the PA 800(1 ro acliieve tjiis level of perffirmance 
are several di.^tingujshing features. F'irst. there are a iarge 
number of fund ional units — ten, as described previously, 
iloweven midti|)ie units alt)ne are not enough. To sustain 
superscalar oj>eratitjn beyond two-way demands advance il 
instruct if jn sclieduling methods to suj)ply a steady stream of 
independent tasks to the functional miits. To achieve this 
goal, an aggressive out-of-order execution capability was 
incorporated. Tlie instruction reorder l)uffer pnnidcs a 
large window of available instructions combined with a 
robust dependency tracking system. 

Second, having exi>li(i1 compiler options to generate hinis 
to fjie processor helj>s a great deal. These specitil instructions 
can be used to prefetch data and to communicate statically 
predicted brmich behavior ro the branch history table, as 
d c^sc r i 1 1 ed pit^ vi o usl y. 

Finally, the system bus interface is capable of tracking up to 
(en pemlhigdata cache misses, an histruciion crtrhe miss, 
mnl an instnuTion caciie prcTeicb. Since [iinl!ii>U* misses c^m 
be serviced in parallel, the average perfoniian<*e penalty 
cau.se d by each is reduced- 

Instruct ion lie order BulTer 

Because of rest iit1 ions on compiler scheduling, a key deci- 
sion was ntade ro have the PA 8000 perform its own instruc- 
tion scheduling. To a<coinplish rhis lask, the PA SdUO is 
etiuij^i'***^ ^^^th iiii instruc tioji recorder l>uffer or IKIi which 
can hold up to 5G instructiijns. This buffer is coinposetl of 
two i>ieces: the Al^r buffer, which can store up \o 28 compu- 
talit>n instructions, and tlie MEM (nuMuoi^ ) bufftT. which ciin 
bold ujj to 28 ioatl and ston^ instnuiions. 'iliese buffer's track 
over a dozen differenr ty|)esof intenk^pf>ndciuies between 
the instructions they contmn, and allow instniclions any- 
where in the window to execute as soon as they are ready. 

As a special feature, Oie IRB tracks brmich predtciiori out- 
comes, arui when a misjuedictiori is idenrifted, all instnic- 
lioiLs lluit were iric(m'ccily feltvlied arc llasb-invalidared. 
Fetching llien lesmnes dowji the correct palh wirlioiit any 
further wasted cycles. 

The IRB servt^s as the central control logic for the entire 
chip, yet consists of only 8r>0^r)(IO transistf^rs and consumes 
less tlum 20^X1 of the die area. A bigh-rjcrformance IHB is ol" 
piiramount importance, shice today's compilers shnply lack 



runtime inronnation, which Is useful for optimal scheduling. 
The reorder buffer on the PA SOOO is 4€% laj^er than that of 
the nearest competitor 

Instruction reordering also leads to iJie solution for another 

tx>ttleneck: memory latency. AllhougJi the dual load/store 
pipes keep the c« imputation unites busy as long as the data is 
cadie-rcsident, a data t^che nil*5ii can still cause a dismptJon, 
Execution can conlirnie fcir tnany cycle*? on instructions thai 
do not depend on the data cache miss. Tlie PA 800U can exe- 
cute instnictions well past the load or store that was missed, 
since the 1KB can hold so numy instructions. WTien useful 
w^ork can be ajtx^nipli.^hed during a data cache miss lateno', 
the net impact on perfomiaiice Is significantly reduc:ed. 

The large window of available instaictit»ns also allows over- 
lap of multiple data cache misses* If a second data cache 
mLss is delected wiiile an (^artier miss is still being serviced 
by niaiin memor>\ th(* secontl miss will be issuetl to iJie 
system bus as well. 

Life of an Instruction 

A blot^k diagrmn of the PA 8000's instruction reorfler buffer 
Is shown in Fig. 4. Instructions enter through the soil block 
and are routed to the appropriate portion of the IHB based 
on instruction type, where they are hekl until I hey retire. The 
functional muts are comiected to the appropriate sedion of 
(he IRB btised on what typt^s of instntctions they execute. 
After execution, instructions are removed from I lie system 
tlurough the retire bk)ck. 

Instructitin Insertion. The IRB tnust be kept as full as possible 
to nuiximize the chances thai four bistruciions are ready to 
execute on a given (*ycle. A liigh-r>(?rrr*nuiincc fV'tcb unit was 
desigJied to maximize [RB o<c'uj)ancy. TTiis unit fetches, in 
progmni order, up to four instruclions per cycle from the 
single-level off-<"hip instniciion cache. 

Limited predecode is then performed, aiid the instructions 
arc inserte<l in a romid-nibin fashioti into the appropriate 
IRB. Kkivh IRIJ segment nuist be al>kMc> handle four iMcoming 
inst ructit>ns per eyelet since there iu-e no restrictions on the 
mix t)f inslnunioits being inserted. 
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Thet-e are st^veral special cases, Brantlirs, althrmgh exetuifHl 
fioni Ihi- y\Lr IRI J, are alyu stored in the MEM IRB as a 
pkcehokier to mdicate wMch entries to invalidate after a 
niispredicieci branch. InslrikUnnsthaf ha^e bolh a coniputa- 
tion anci a memory ccinipijju'iit and I wo targets, suHi as the 
loati woj'd and modify (LDWM ) iJisinu lion, ;ue s]>lit irUo !wo 
piei es lind nciujiy an entry in [jotli porUons of the Hill 

Instruction Launch, fnsiructions are allowed to execute onl of 
order. During every cycle, both segments of Ihe IRB allow 
tlie oldest even ant! the oUlest odd instnKti<m for which ;ill 
optTands are availal^lc to execute on ihv fnnclioniil unifs. 
Thus, lip to four instruct ioi^s can l>e executed al mice: two 
(*oinputation instructions and two iiiemory reference in- 
slria lions* Once an instruction has been executed, its result 
is heitl in a tempo rai>^ rename regii>fer and made available 
for use by subsequent instruct ioni^. 

Instruction Retire. Instructions me removed or retired fjom 
the IRB in jjroj^ram order once they have executed ant! imy 
exceptions liave been detected. Enforcing si rid retirement 
order jj resides software with a jirecise except ion model. As 
insuut lirms are retired, the conienis of the rename registers 
are transleneti lo llie general registers, stores are pUiced in 
a queiie to be written to cache, and instniction lesults tire 
committed to the m'cliitected state. The tTttre unit can liandle 
up ro two AI^l' or floating-point inslructitjns mid up to two 
memoiy lasinictions each cycle, 

Tlie HP PA 8200 Processor 

After tl)e successful introduction of PA 8000 pro( ess<jr-based 
products, tile PA 8000 design team initialed a follow-up pro- 
gram. Performauci^ analysis on key applications identified 
several oppoi I unities for future tirodiirts. The l*A 8200 CPl' 
team formulated a plan for improvement based on the 
following goals set by HP customers and management: 

• Improved peribrmance 

• Compatibility with existing a|)t>ht*alions 

• beverage of the PA 8000 dt^sign foiUKlation 

• Rapid time to markei. 

Improved Performance. Apjjlication trace studies identified 
brimch ]jredit:tion, TLB miss ratest and increased cache 
si/.es as significant oppoH unities. The availabihry of next- 
generation tM-bh SRAMs with imjiroved access rimes 
allowed Ibe design team to increase ijr ocessor clock speed 
and double cache size to 2M bytes for both the instruction 
cache and the data cache. The faster access time of 4M-!iit 
SRAMs alloweil higher jnT^cesscjr clock rates without 
changes to (tie cache access protocol. 1lie combination 
of increased clock frequency; larger cacties^ improvement 
of branch prediction accuracy, and reduction of TLB miss 
rates enables peifont^ance improvements of 15% to -MIH} on 
key ajipli cations. 

Compatibilitv with Existing Applications. Follow-on products 
using the PA 8200 had lu ju'estivL' our customer's' invest- 
ment in PA 7200-l^ased and PA 8000-based software imd 
hardware. It was consi*lered essentia] lo mahUaui binaiy 
comiiatibility with esisting PA- RISC applications £md pro- 
vide ai^ ujjgrade [iUlb for impi'ovetl perfoniuuice. 



Leverage of PA 8000. The PA 8200 design team levi^a^t^ed the 
extensive fiinciional mul electrical veriftcation results accu- 
tnuiatcd dmhig the pi^totyping piiase of the PA 800(1 develop- 
ment. A wealth of design data is collected in the jjrocess of 
turning a design into a product. This informaticjn identified 
tiie tiaths limiting CPl ^ tjpc^rating spetnl and the [>eri'ormance 
limit CIS in tlie iuanch aJici TLB units. C haracterization of the 
r*A 8000 <ache design provided tlie basis foi* a new design 
using high-speed 4M-bit SRAMs, 

Rapid Time to MarNet Tlie comi>etitive situaflon dictated that 
sptH<I iijjgnidcs to tlie f^\ 800(1 were ne€*de<! to tnaintain 
HP's ijerlV^rmance leadership in die high-rieifonnanee work- 
station and mid range server markets. Tht^refore, design 
clianges and characierizalion of the evfjanded cache subsys- 
tem had to l>e completed within a very aggressive schedule. 

In the following sectiotts, r*A 8200 design changc^s to the 
PA 8000 [processor will l>e <letaik*d, 

PA 8000 Performance ^\iialysis 

Given the goals of incr(vise<l iH^rformmice with Inw risk and 
a shoit time to maiket, it was necessary to undcrstcuul fuliy 
where the PA 8000 excelled and where signirK ant improve- 
ments could be made. Key customer applications were 
exatnined to determine how^ real-world code streams were 
being executed on the PA 8000. 

For the PA 8000, the expectation was set that no codu re- 
compilation would i>e necessary to see a 2 x speedup over 
the PA 7200. We did not w^ant to change this expectation for 
the PA 8200, so aO code experiments were pericirmed using 
nomecompilcd, nontuned code. It was shown that the 
PA 8200 s p(n'formance could be eniiaiuet! significanlly over 
that of tlie PA 8000 by reducing the amount of time the 
PA 8200 spent waiting for uLsiructioas or data. The linuich 
hislorii^ table (BUT) and translation lookaside buffer fTLB) 
aie an liittHtural features that are hi tended to reduce waste<l 
cycles ivsiilting from iJeiiidties, particularly in pipelined 
macliines. For mispredicted branches, TLB misses, mid 
cache misses, the number t>f penalty cycles increased ftom 
the PA 7200 to the PA 8000. It was expec ted that a corre- 
sponding reduction m mispredict if jus and misses and the 
ability to hide penalty cycles using out-of-order execution 
would result in an overall decrease of wasted cycles. The 
analysis of the application suite showed otherwise, as the 
nimiher of wasted cycles inc teased fnan the PA 72i)i\ Eo the 
PA 8000, accounting for :J0 to SO t^cTcent of the* iT»tal ruimber 
ofcyt les spent on each uistruction (CPl). If tiic nunii>erof 
mispredict ions and misses coultl be decreased, a significimt 
|)erformant*e boost would be realized. As a result, increases 
in tlie sizt^' of the BHT TLB, and caclies were examined as 
potential high-benent, low-risk improvements to the PA 8000, 

BHT Improvement 

The biggest pcrfoniiaiici^ weakness obsen^erl was tlie mis- 
predict e<i blanch i^enjdty. By its nature, out-of-order execu- 
tion increases die average i>endty for mispredicted bnmches. 
Tlicrefore, significant design resources were allocated for 
the PA 8000's branch prediction sctierne to lower Uie mispre- 
fli( tion rate, thereby offsetting tiic higher penalty. Tlic results 
of tlie perfonnance analysis revealetl that cycles wasted 
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l>e< aii-s*^ t*[ bnuivh in^nallles wen* sfil! significantiy inijiacting 
perf(>ni\aiicc\ R^lath^e to thi* PA 72CX>. the iiiLsi)rt*(!it'tion rale 
Ls generally about 5€% lower across thr saniple wcjrkk*ad of 
tefhiiiraJ ajjplicatiot'us. However, the orle penaky for a oiLs- 
predicted hraiK'h rost> liy 2W*^j, more ihim offsetting the 
reduction in miss rale. There are clearly two possible solu- 
lions: dtx^rt^asing the miss rate or decreasing the miss i>enalty. 
Be<:am5e of the short tinic^ scluniule of the progntiu. redefining 
lifm- mispredicted bnuirhes im^ handled to reduce the pendt\ 
was noi a viable aliemadve, Tlie more pnicf it id soliiiinn w^ls 
to improve branch prediction accurao 

Improvements to the BHT fociiseti on two areas. The first 
\^as the table si^e anci the *jecond was the branch pitnlicrion 
algorithm. The PA 80<XJ um».s a three-liit miyority XTJie algo- 
rithm ^ukI a 2r>6-entr>^ BHT. SiiK-e the PA 80Q0 alsi* lillows up 
lo iwo branches to tTtire siniulianeonsb; the table ideally 
\^oiild be able to update two entries per cycle. Parallel BHT 
ii[>date was not miplemented on the PA 8(K)0, resulting in ihe 
outcome of <me of the brajiches not havinii its ijiformatioti 
eiitcrt^d mto the BHT. An;dysis of r Ins Linuiatlon revealed a 
mirior penalty tiiai could easily be eliniinated in the PA S2CK). 

Initial in\'estigation for BHT iinprovemenis focused on the 
size of the table sitice it is easier to increase the si^e of an 
existing stnn lure than to start from scratch and reclefme the 
algoritlmi. To !uive the minimum itupact on conlnji logic, il 
was (iesirahlc to increase the table si/e by a tnulli[de of two. 
Visual inspection of the area iuoinid ibe BHT revealed that 
ihc mmibcr of entries could be increased to oI2 with little 
impact. 

Ni'XK possible clKUig<'s \u Ibe prediction algorithm were 
cxj^lorinL Tsiaga more cojumou algorithm becanu^ the key 
\o alio wing I be BHT to gn^w to 1(124 entries. Tlie new algo- 
ritlim requires only two l)hs of data tonipared to tiie three- 
bit algorithm imjjlernerittMl on the PA HUOil Analysis of the 
Iwcj iilgorilluns slvowcd liiat they result in almost i be same 
J predict hins with orily a few excefittons. Tbe rednciiou in the 
lumilrer of bits i)er emr>^ from ibree to two allowed tbe BUT 
to grow from 512 to 1024 entries. Tbe in<'rease froiu lite 
algmithm chatige was shown thtough simulation to tirovlde 
more of an incremental imincjvenu-o! than was lost by the 
switch to the twfi4>u al^orithni, 

Oneatl<firio]tal impro\inient was maile to the BHT concern- 
ing the handling of multiple branches retiring at the sajne 
titne. Allowing two entries to be updaleH simullaneiUisly 
riHjuired the data c»ntjies to bave two wriic^ ports. This fime- 
tiojiality was not inclnrlcd iji tbe PA 8(}tKJ, so inu*leinf ittinga 
two-port solution on tbe PA 820b wouirl tie verj' ext>cnsive in 
die* area Therefore, a control-based st>hrtion wan deviised. 
Ulten two bmnches retire on the Simn> cych\ the in format ion 
necesstu'y to update tlie caclu' for orn^ of tbe Immcbes is hchl 
in a otie-enliy (jueue, On Ibe n(*xl cyt^e* the data ir\ tbe rjueui^ 
is used to update the table. If another branch also retires on 
the next cycle, the queue data is v^Titten into the BUT and 
tire lu^wly retiring bnitieh's data is stonxl ilk the queue. Only 
if twfi hnurt lu's retire white tlu^ ciueuc contains data is the 
data f<ir one branch lost/rbis c ondirion is considere<I tti be 
(|uiit* rare, since it rtHittires that nuilti|jle i>airs of branches 
tetire coaseciitively. The rarity of this situation nuikes the 
jjcrformance imtmci cjf losing the tnurth t onseeutive bnmchs 
data tvegligibk\ 



The risk invoKxxi with making the desc*rib<xl changes to the 
BHT was relatively low. The data storage elements are \^'ell- 
understtxKi stnictures mu\ could l)e expanded with little risk. 
The coiurol for the new Birf c*ouhi mostly l>e levcniged 
from [he PA 8tMKJ impIenunUaiion with the* exception of the 
new branch store queue. Sigiuficimt fimctional verification 
was done to ensure correcfness of the tiew BHT. Since 
con! ml and data patlis remained *ilmosi the sanie as the old 
Biff, there was high confidence that the changes would not 
introduce new rre<jiiejicv limiters. 

TLB I m p ro v eme n I 

Tlie second nuO^>i' atva of iniprovement involved the TLB. 
Helative to the PA 72(>0, the PA S(K)f) uses significantly more 
cycles handling TLB misses on most of the applications used 
to analyze peiformance. The reason for this increase is 
twofold. First, tlie penalty for a TLB miss increased ft tun 
26 cycles rni the PA 7200 ttj 07 cycles on the PA 8000. T\\e 
in<Tease in TLB miss |>enalty was mainly caused by mi 
increase iti control complexity resulting from tbe *nit-of- 
orfler cat>al)ility of tiie PA 80(X). Second, the TLB tniss rate 
for most of the aj^plicaiions ex;unined also inereaserL The 
total miml>er of entries decreased i:)y 20K) from 120 to H(i 
between the PA 7200 ajtd the PA 80(^0. However, the PA 8000 
has a t^ombined instruct if in ^uid data TLB wliile tlie PA 7200 
has sepiarate instruction ;md data TLBs. Ai the time, a de- 
crease in size seemed an acceptable trade-off since instnic- 
rion ami ilata TLB entries could now use the en the TLB. 

Since the |>enaky for a TLB miss could nol be reduced witlv 
out significant redefinition of hcjw a TLB niiss is handled, 
the number of entries was the area of focus. Simulation 
reve;iJcd Ibal increasing Ibe number of entiies |uxi\"ided a 
nearly linear imt>rcjvement in the TLB miss rate, leveling off 
at about 128 etUries. In looking at the ^urea the TLB occupied 
and tbe sunrjundiug rout ing channels, it became clear that 
128 entries wmild involve at^ unacce[)tablc design risk. Since 
the imijlcmcntafion is most cfficiem in multijiles of 8. we 
next examined 120 entries. Initial (*xamijiati(m of the art- 
work ahowetl tlmt this target would Ijc aggiessive, yet rea- 
sonable. Sinuilations were done assuming 128 entries to 
priivide sonn^ additional liming margin and n> fidow for 
increasing to 1 28 entries if it becanu* t^ossible. Most of the 
c intuit timing paths were foimd I r> have ncttrly the sante 
p(Tfonnant*e with 120 etUries as Oti eiUries sith e the critical 
variabli' ftjr timing is generally ihr width f>f ais enttyatui Ju>l 
lh<^ nmuber of (Mitnes Some in trior changes to transistor 
sizing provitltxl tlu' addilionai margin necessary- on critical 
paths tliat traversed tlie TI^B array. The goixl of t.hes<^ cliatiges 
was to hicrease the mmibtT of TLB i-nlries over ihe PA HOtM) 
without impacting speed. 

The biggest risk that the TLB changes posed was to the 
project .schedule. The area affecled by the changes was 
utnrh larger than that id' any other cbmtge, atvd there were 
bard t)oimdaries to other functiotml irniLs tliat constrained 
desigr^ area. To iiu*re;i.sc the size of the TLB, two complex 
signal channels weiv o-routed. Alibougli nt*cessmy to pro- 
vide tlu^atlditional room, the ciumges were time-consuming 
atul present erl significant .schedule risk. Homing changes 
also increased the iliant^e of a change ii^ the elec-lrictd per- 
fnnnanee of the affected signals. Tn minirni/e this risk, a 
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tool was written to verify Diat signal integrity was not com- 
promised. Overall, ihe rerouting of the channels was the 
crilical path to tape release and also the highest risk. 

F r e q uency Improve m ent 

in addition to improving the BHT atul TLB perfonuance, (he 
largel frequency for the PA S20t) was int-Teased over thai of 
the PA 8000. We took a two -pronged ai>i)rtJac'h to timing 
analysis. The ilrsi apj^roacii ennsisiecJ of analyzing a soft- 
ware model of PA 8{)00 liming and I he second ai^proach con- 
sisted of examining data front ptotolyi>e systems hi which 
we increased the frequency to ihe falhng point. 

The PA B0()0 timing was modeled using Epic's Paihmill and 
Timemill .suite and Veriiog's Veritimc. These tools jjrovided 
an ordered set of paths ranked aecordhig to preilicted oper- 
ation frequency. We gjoupetl I he data into paths that were 
internal to the chip (core pafks) and paths that received or 
dro\'e info rniaf ion to the cache pins {cache paihs). It hecame 
reatlily apitarent Uiat there w^is a veiy small set of core paths 
aud a much larger set of cache paths that coultl poleniially 
limit cliip frequency. The core jjaths tended to be iiideijeu- 
dent of all other core patlis and could be imt)roved on an 
indivklual basis within the C Pll The cache patli Ihniters 
tended la fufuiel into a couple of key Juncture points and 
could be globally improved by addressing those points 
As ail additional degree of freedom, cache paths could be 
addressed through a combination of CPU, board, and cache 
SRAM iraprovemetils. 

Once it was determined which core paths might Tiinit chip 
fre([nencyt we had to de\ise a method to correlate the simu- 
lated fretiuency with actual cliii> [lerfoi'mance. Targeted tests 
were written to exercise potential core limiters. Pat Its wevi^ 
chosen based on their independence from knowai lijuiters 
and for ttieir ability to be completely controlled by the test. 
The targeted tests ran consistently faster on silicon than the 
model predicted, gi\ing us confidence that core paths would 
not be frequency limiters. 

We then looked at correlating cache patlis between tlie mtKk^I 
and the system. Cache pat hs tentl to be multistate paths 
dependent on the timing of the cache SRMls. Because of 
these attributes, it was not feasible to craft a chip-level test 
to exercise sj^ecific cache paths. Therefore^ we decided to 
rely upon system data for determining worst-case cache 
paths anrl then use the model data to show tlie frequency of 
other cache paths relative to the worst case. System wtjrk 
revealed two cache path frequency limit eis. Hotli paths 
were i>r edicted by and coiTelated with the timing model, 

Biised on tiie cache paths exposed through system work, an 
additional timing uivestigation was launched. Both paths 
fujmeitetl into a similar- set of cirt iiits to send addresses to 
the SRA.Ms. AJl other inputs into those circuits were exam- 
uied and indivitlually simulatefl using SPICE to detenihne if 
they had the potential to become frequency hniiters. From 
this effort, one additional set of inputs was identiiied as hav- 
ing a high risk of becoming a frequency li miter once the 
kno\m limiters were imptoved, Tlie pioposed improvements 
to the known limiters improved the newly identified path as 
well, keeping it from becoming a critical path. 

The tinal step taken to tmderstand the frequency limitations 
of the FA 8000 was to de\1se a way to look beyond the knovvti 



limiting paths in a systeni. The lowest frcHjuency speed hmiter 
wiis a cache path related to ;ut architectural feature to im- 
prove performance. On Ihe PA BOOO, this feature can be dis- 
abled. However, the second speed litniter was not progiam- 
mahle and was thereibre ca]>abie of niasking other paths. 
We turnerl to focused ion beinn (FIB) technology to help us 
solve this protjJem. 

The second speed limit er was a single-jjhase |>atli that sttuted 
with the rising edge of a clock and ended with the falling 
edge of a derived clock. By ci elating the falling edge of Ihc 
derived clock, we could mcrease the frc*quency at winch tlie 
path could nm. creating a new region in which we could 
search for failing paths. We used the FIB tcj cut aw'ay atul 
rebuild the circuitry for the derived clock. In the process of 
stripping away the metal on the chip and then redepositiiig 
it to rebuild the circuit, resistance is added, slowing down the 
circuit. We were ahle to acid 220 ps to the path, increasitig tlie 
hilling frec|uency for this linnter by approximately 22 MHz^ 
The FIB-modified chip w as [i laced in a system for extensive 
testhig. No additional failing paths were fonnd in tlie newly 
opened frequency region. 

In impi*o%dng the critical paths for the PA 8200, a conservative 
design approach was adopted. Most of tht^ improvements 
irna>lved moving clock edges, allowing latches to update 
earlier than before. Such changes can expose ra<:'es or setup 
vi{ilatir>ns- Tlte paths were carefully simulated to eliminate 
the risk of hitroducing a race. In crises where it wtis difOcult 
to precisely deternnne the setup time needed for a signal, 
conseivative changes weie niade. 

Cache Improvement 

Yet another area for improvement on the PA 8200 was the 
cache suljsystenL The cache size talays an mtegral role in 
defeirnining how well the system p erf onus on both a p plica- 
lions ant! I>ench marks. In adtUtion, the off-chijD cache access 
path can limit the operating frequency of die system because 
of the tight coupling between the CPU tmd the SIL\Ms. 

Tlie FA 8000 offered a maximum cache size of IM bytes for 
both the instrttction and data caclies. A total of 20 IM-bit 
indnstry-standaixl late-write synchronous SRAMs were 
employed for this conhguration. The printed circuit board 
design w^as cumbei'some because of the large number of 
SR4M sites. The design result erl hi relatb ely long round-trip 
delays. As tlie PA 8200 was being definetl, the next generation 
of SRAMs became available. These 4M-l)it t^arts were fully 
back weirds compatible with those used with the PA 80(10, 
The emergence of these liigher-density conit ionents made 
possible a 2M-byte instiTiction cache and a 2M-byte data 
cache while reducing the number of SRAMs to 12. The 
resulting botird layout was more optimal, contributhig to 
shojler i outes and better signal integrity. 

In addition to cache size, the frequency limitation of the 
off-chip cache was carefully addressed. For much of the 
post-sihcon verification of the PA 8000, the t\\t>state cache 
access i)resentetl a frequency baiiier that Imiited tlie amount 
of im estigat ion bey<:>nd ISO MIlz. Two main contirbntors 
allowetl the frequency of tiie FA 8200 to be increased well 
beyond 2(K} MHz. The first was the new SRAM placemeut and 
routing for the cache subsystem, Tlie 12-SRAiM configuration 
yielded a new woi'st-case round-tilp delay tliat was 500 ps 
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shorter than the 20-SRAM conBgiuulion previoasly used. The 
second enabler was linked to the next -generation SRA!^Is. 
Not only did these parts provide four Uines the deiisit^^ they 
also reduced their access (lines from 6.7 ns to 5.0 ns. The 
combined benefit of Ihese two enal>lers resuhed in misiiig 
the maximtnn cache-limit e<i frequency from 180 MEz to 
2B0 MHz- The value of this improvemenf was really twofold, 
Firsi, it etmhltHt systeni4e\ei electrical characterization and 
VPV f^ore speed jjath identiricat ion in a space previously 
unexplored. Sc*cond. it resulted in a maiitifactunible [>roducl 
that could meet die jierfonnance needs of our workstations. 

PA 8200 Performance 

Tnder tKiininal opemting condititins of room temperature 
and ^i:^-v{j|t power supplies* the PA iS20fJ is capable of run- 
ning up to 'iOO MMz, 7(^ MHz faster Uian its predt*ccssor 
Tal>le I suiutnarizes hs perfonnanee. 



Tsble I 
HP PA 8200 CPU Performance 



Benchmark 
SPECint95 

SPHrff^ii.", 



Estimated 
Perfotmance 

16.1 



Frequency 

2m Mm 

230 MHz 



Conclusion 

The HP R4 8OD0 HISl^ CPl' acliieved ind tistry-leading perfor- 
tnance across a wide variety of applications by using an 
aggressive ontHuf-order design and c^irefully balanritig hard- 
ware utiiizalion throughout the system, lite PA 82tM) lever- 
ages that design, improving key areas idenUTied by customer 
neetls and applicatiijtLs. The number of TLB and BHT entries 
was increased, chip operating frequency was increased, and 
the cache configuration was updatet! t«> include Uie latest 
a\^lable SRA.M teclinolog>; Together tht^ie t:liaitges improved 
system perfomtame across customer applications np to 
23%, once again delivering industry-leading performance. 
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Design Methodologies and Circuit 
Design Trade-Offs for the HP PA 8000 
Processor 



This paper discusses the various design methods used in the PA 8000, 
specific design techniques for the new packaging technology, the clock 
distribution scheme, cross-chip signal integrity issues, and some of the 
new tools and techniques. 



by Pan] J, DorweOer, Floyd E* Moore, D. Douglas Jost^phson, and 
Glenn T, Colon-Bonet 



^Die incT'etL*5ing demands lor grentor prf.>ce.syor performance 
to remain Lonipetitive in todays conipulei' market necessi- 
tate careful attention to the methods used in designing 
processors to achieve these pt^rforrnance goals, Pro{ essor 
designs are huTejistnj* in complexity to tru>el pertordiance 
goals, with siieh features asou1-or-ordereKe('uli<(ii and stiper- 
scaiar o)H^raf itni. Design cycles are deiieiLsing in length, so 
design quality must hicrease bs welL All of thes€» factoids call 
for new design techniques to ensure continued success. 

This paper will present some tit the rlesi.i^n metlioth^logies 
and rhoites used ill (tie design otthe tllVPA SOtlO (1't I the 
lli'st HP processor to impletneni the FAdMSC Z.i) architect tire 
aiid tiie thst capai:tle of 64-liit operation. The various design 
methods used in the PA 8000, specific design techniques for 
the new ])ackaging tethnolog\^ used, the clock dLsl:ril>nt]on 
scheme, and <Toss-chii) signal integrity issuers will be dis- 
cussed. We will also present some of the new tools and tech- 
niques enipluyed by HP to ensure a high level of qiudiiy on 
first silicon, based iti hiige part, on our experiences with 
jnevious PA- RISC irii<'!<>pro( essoi designs. 

Design Trade-Offs and Methodologies 

lYocessor tiesign is a < (Hitinnuus series cif trade-olTs between 
die ;n:'ea, comijlexiiy, i>errormaiiee. speedi power use, anti 
design time. Given the complexity of a hjui-way ouJ-ororder 
pro(*eEsor such as tiie PA 8000, ii is ivot aijprot>nale \u ennjlcjy 
the same circuit design techmque^i for all blocks on ll^e chip. 
For the PA SfXK), three major ciicuit desigji technitjues were 
used. 

The fii^t is the traditional siufU' fk'slgtt aiiprtiailL in nliich 
all outpin signals are hekl tjiu^ as k»ig as tfie inputs to the 
static cell remain constant, ytt>rage of viilues, or -slnlp, is in 
latches, and logic fimctions aj e implemented using a variety 
of different logic blocks, alk)wing minimization of area or 
j>ath e\"aluati<jn time. Since static logic is fauly mimuueto 
noise effects (at least on a local l}asis ), this is the safest 
design approac'h. Frequently this is also the design approach 
that needs tlie fewest engineering resources, Tlie synthesis 
and layout s{ej>s cmx be atnompiished by automated tools. 



widt tnersigln t>y the designer to ensure that the block satis- 
fies requireuienis, liming jnitl^s are met. elect rkal rules 
(such as metal elet tromigjation) aren*t \it»la(cd. and so tm. 

Static design technifiues are ikji ideally suitefi for iai^ge f^m-in 
and huTout lunt lions. Because of ttieii pullup/})ulldown 
<]c^sigiK static^ gates are not the fastest evaluation meth(}d for 
e e rl a i n high Iai 1 4 i i/ i'ai loi it a| j] d i c at i ons . Si t f<jfe ntil (lijnamir 
Uitjtc or (ittitihfn iogU is better suited to these applications, 
parliciilarly OR IVmctlons. A good examj^le of sutrh a function 
is the o[>eranfl thun|) lines from register files. For an out-of- 
order processor with r)|>eri:aid ilata toming from t)oth tename 
and architect eci state registei's, the number of drivers on one 
bus is quite large, hi the case of the PA SOOO there ar(^ 5(> 
rename register mid 32 architected state register on botli 
the integer and noatingijoint sides. Tiying to drive a single 
bus wilh S8 sialic drivers is a much intjre iliflicult task tlian 
nsini^ single^ail dyuHmit^ it^^'< ■ Tlie lower capacitance ot 
sitni>ly iisitxg i\i\ tvchannel PET driver and a bus prtHhargcr 
for the oondimip stat(^ heli>s tremendously in tins instance. 
Static logic will also consume more area to implement these 
ty"l>es of functions txH/ausc it requires extra i)-chamiel FET 
pidhq^ trees in each block. Ilowevei; ilynaiuic logic is more 
susceptibk^ to noise, requires more careful design attention 
than statie h>gtt , will in general use more i>ower, and since it 
is a e]ocke<l nreehauisni, alsr> iut reases I he clock loa<l. lliis 
type of logit^ is employed in the data i^ath portions of tiu^ 
PABOOO. 

Single-rail dynamic logic does fail in some iitstimces. par- 
ticukuly wlien trying to use thc^ inversion of a vahu^ in tlie 
middle of a logic chain, or using an AND finiction. In this in- 
stance and where sialic kjgic is not last enough, iifiiufi-rail 
di/namk' iogh: sf lieiiie {-an be em[ilf jyed. hi this tyi^e of logic, 
both the }>osltive sense and the negative sense of a signal 
are derived, both in a low-go-high fashicm.'^' Inversit^ns are 
accfjm|>lished sunply by switchhig the low-sense ami high- 
sense signals between gates, Tliis logic can be quite fast 

' LoW'gO'higli means thai ttie Signal siarts at tha ground voltage and iTansftians onty Qfice 
6mm an gvatuafe sia^ :o the supply voEtags Vqq. 
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siJire the desi^i t>f the gates optirnizes one transition edge 
and fl\iixiniic techniques are empJoyed in the piiUdowTi trees 
of the logir gales. In addiU*>iK j^inre liining iiifomiation is 
incUide^l lAith the transition of one <»r Uie odier output sense, 
it Is a self-iinied nTec^hanisni. By eniploving latches that 
sense ju?*t the first transition of an output pair, this t>i>e of 
logic can he pipelined and u^ed in mulliplc stages, Ihi<i]-riiil 
dynamic logir does roitsunie a large amount of area ;ind 
power, iUid thereto re was eniployt^i only in I lit* most linie- 
critical ponjotis of the PAKrtOO. mt*si:notai>!y \hv noatlJig- 
point execution units. 

Alpha Particle Seiisiti\it}' 

The decisifjn !,o use lead solder hump technology' to enal^le 
nip<'hip die atmeh for the VA 8000 presented a new design 
challenge for the team. Previous designs were all wire- 
hoi\ded dice ijt cerantit^ i>iii'gnri array packages (CHiA). 
To t>re\ ent alpha iwuticles ( whicli ujt itlentical to heliuni 
nuclej) emanating from the package or ^^ire boiuls from 
upsetting sensitive storage nodes within the processor, a 
silicon com|)oimd is used on the (lie surface. The flijj-ehip 
attach meiJiod, howe\'er, places anaysol iiit^sfly lead (!^b) 
hemispherical humps over a si^fniflcant poilion of the die 
surface. The bump material conlahts s^>me heavy elements 
that EUT railioactive and the cit^cay of these elements pro- 
duces alpha iimticles anil l>ela and gauuiia rays that can 
cause a shigif'-rvntt upsf! of a sensitive storage node. 

The single-event upset is a high concern in integiated cir- 
cuits because a change of state of a storage node Ciui have 
serious consequences for executing programs. Any alpha 
particle that leavTs the solder^ hurnti has surficient mass and 
energy to cause an ionizeti Hail iif liohM-lectJon pairs that 
cieaie motiile charges that c<in lltjotl a [jtjsitively charged 
storage node ancJ <:ause an unintended state change of a 
memory element. To inininitze tins undesired event, certain 
design changers were adoi>t fd for PA 8000 memory circuits. 

A SPICE current pulse motiel Ihal sMunlaietl the behavior of 
an aifiba particle WrLs derived fruni hoih (^nijjincal measure- 
ments on existing products ajid simulation using It' |>rocess 
modeling soft\'^'ai'e. A design rule for the tiunlnumi storage 
charge (QrHtir^l) was set and all storage nodes were de- 
sigjied to meet the new guithHine. then verified by RPfCE 
simulations using I he alpha jmilicle ( unent [>ulse nuidel 

Clock Distribution .Scheme 

hi a liighnt'qiiency design such as the PA HQOO. minimizing 
cross-i'hiti clock skew is critical t(i <^nsure the maxinum^ 
amount of time fur logic iuid tiata path operations to com- 
plete. Lack of attention to clock distribution for the entire 
chip win resull in a lower fre(|uenry of opiTalion and more 
desigJi resources being sprnl sm reducing delays in hudgel.s 
that contain cross-chi|> paths. Excessive clock skew also 
incretises tiie likehhood of inlro<lucing races into the de- 
sign tiutt wdi need to be idc^ntitied and fixed. For these rea- 
sons a consideralile ami nun of effort was sjkmU in Ote iiives- 
(igiUion ajul design of tlu* L'lock distribution scheme for tlie 
PA 8000. 

Als(j affecting clock skew across the chip Is the amumit of 
load on die global t lock signal With singlt^rail mid dual-rail 
dynaiiuc cinuitiy m the data path sec^tions, the overall clock 
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Fig. 1. ll'Uv'i' liistrilnHion uruvurk. 

load is greater thart it would have been had only static cir- 
cuitry been used. Tliis jdaces an additif>nal burden on the 
clock distrilaiiion network because skew inci*eases with 
loatl for a given clock network dellnitioiu 

The clock distribution method employed on tlie PA 8000 is 
an H-tree nietal sinieture (see Fig. 1) to deii^ er the clock 
signal fnjm the C4 snider bumps to a fu^1-le^'el on-<iiip clock 
receiver. The output of ibis receiver is then routed using 
matched wire U*ngtlis to a second level <if t U*ck buffers, 
witlr each Ijufler catefuUy positioned on the ehlii and the 
output load of each buffer matched as closely as possible. 
Given flie large size of the die for the PA SOOO f 10.2 by 
17.8 nun), jnocess variation will ine\'i(al>iy make the FETs 
used in diesesL'Cond-level clock hnffers nnettual hi strength. 
The design of these buffers attempted to mmimize Ibis 
si>eed variation. A graph of the overall skew nsuig the ihial 
cl(M'k distribution srluMue is shown iji Fig. 2, Using Ibis 
dt^sign, die overall clock skc^w across tlie die was heUI to 
170 picoseconds. 

From the secotvd-level clock buffers, t tueffil aUejUicm was 
tJaid to tlie routes of the buOered clock outintls to the next 
level of chTuitry, To minimize the power dissipation of the 
chit) and lUdvide nono\ erlapt)ing clocks to control blocks, 
con! rolled tmffer Itlocks called rhrk ifaffrsiuv emtjk^ved. 
Different types of clock gaters can generate ove['lapi>ing and 
fiotioveriai>j)ing clocks, mid each size of gater is rabnl Uyr 
a specific iununnt oftailimt load. Checks were t>erfornted to 
ensure thai the |irot)erlnading w^ts niaintained on all gater 
otUput.s, since die clot k outinits for these gat(T blocks were 
guaranteefl to a certain specification only under a rated load 
r<inge. Whenever possible, the clock gaters were qualified 
with control signals to strobe their clock oint>uts only w^hen 
necessai^. This ^dlows the clocks for vmions hmctional 
units to be clocked only when ac ttuil vvrjrk needs to be ilone. 
reducmg overall chip pow^r dissipation. 

Tiniiiig 

To ensure high-frequency otKuatioti and a short post-iape- 
rc^ lease iierit^df vigorous timing checks were employed Ijy 
PA HiM) bl(j( k and lop-level designer's The timing eflbrt on 
the PA HOOO wjis far greater than on r^revious HP ]>o>cessors, 
and w as a .significatu factor in producii^g first silicon that 
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ran at the targeted design frequency from the first boot of 
the operating system. 

The size of the die compUcated top-level Uniing analysis 
because the sheer tiistance some signals had to tra^-el added 
significant delay to cross-chip budgets, Over-ttie-block rout- 
ing was necessary given the large nimiber of top-level signals 
present on the chip. Noise and (!apacitaiice to metal layers 
inside of 1 he blocks being routed over had to be factored 
into the top-level timing analysis. 

Repeaters were employed on the PA 8000 for long-route, 
ttmingHL-rittcal signals to reduce the delay and allovi^ for faster 
signal edges. In scjnie cases this was accomplished with one 
noninverting buffer, and in other cases split inverters along 
the route were used. Where possible, single inveiters were 
used in cross-chip paths if diLs level of inversion could be 
absorbed by the receiving or dilving logic, thus speeding up 
these paths. 

Block designers ran timing simulators, both path-driven aitd 
stimulus-driven^ to check the internal timing of their blocks 
and to verify that their published drive and receive times for 
global signals were valid. Close to tape release, a iai'ge effort 
was put into driving dovm the number of slow cross-chip 
paths, which thi'eatened the frequency goal of the PA 8000. 

In addition to the timing checks performed on the PA 8000, 
other quality checks were peribrmcd to detect poieniial 
problems discovered on pre\ious processors. The checks 
will be described iji the remainder of this article. Most of 
these problems are relatetl to noise events ott signals and 
supplies that trip sensitive circuitry, causing failures. 

Latch Margin Checks 

Latches are an iu^portani part of any processor design. A 
Ituge amount of state information about a currently running 
program needs to be stored. Control logic and data paths 
both employ latches to a large degree. Latch designs trade 
off setup, hold, and in-to-out delay times by optmiizing the 



size of various FETs in the latch structure, particularly the 
feedback inverter, wliiciv holds the state of the latch and 
nmst be overconie to chmige the state. The PA 8000 design 
employs transparent latches in which the input signal passes 
through a series n-channel FET and thus suffers a gate 
threshold voltage drop as well. 

Since changing the state o^ a latch inadvertently is potentially 
disastrous, avoiding poor latch designs was a critical design 
goal. For this reasouj a specific tool was developed to ana- 
lyze the electrical margins of a latch and was rmi on all the 
latches on the PA 8000. The complexity of this tool grew 
from a desire to be able to check both fiill and half latches. 
A full latch coiisists of two cross-coupled inverters while a 
half latch has a smgle FET connected to the inverter output 
(see Fig. 3), 

The latch check program evaluated the set drive path to 
determine if it was strong enough to overcome the feedback 
FETs. Since the input drive signal must be known to accom- 
plish this evaluation and extracting this drive signal from all 
of the places where latches are used is a rather complex 
task, the pro grant had to make some assmuptions about the 
driving block when run only on the latch cell For critical 
paths or latches with particularly small margins, the actual 
driving path was placed into a small schematic and the 
program was run on this schematic to ensure that the latch 
was accept^le. 

Signal Noise Checks 

In implementing the PA 8000, additional levels of inter- 
connect were required T^ith finer geometries than had been 
used on past designs to connect the blocks on the chip 
together. This posed a niunber of problems m guaranteeing 
that the design would be electrically robust at the high fre- 
quet^cies at w^hich tiie PA SOOO operates. Experience dming 
electrical characterization of previous designs indicated that 
internal signal mtegrity would be a serious issue for the 
PA 8000. 
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SignaJ Integrity Issues in Advanced Processes 

Three iiia,jor problems ^irise wilii interconnect as processes 
conlintie their inexorable niarefi fowani sniiiller dimensions 
aiHl liiglier rre(iiieneies: 

' Signal crtjss talk is \ *^i7 signillcant at tlie 0.5-(.mi process 
generation luui l>e.vond. 

1 Signal rise and fell times decrease as transistor speed in- 
ereases. 

' Signal coupling inci'eases because smaller dimensions are 
used for in! «^r canned. Tlie sniiiller dinu*nsions (^*specialty 
increase couplmg lietween metal lines on Hie same inter- 
connect layer. 

Signal cross talk (no be elTects) ijiclndes both eapacitive mid 
indiKlive (Mnijxinein.*^, In the eciuatttms i = Cdv/dt and v = 
Ldi/dl. all of I he rarnn^i — (', l^ dvAJt. and di/d( — ;iiv hicreiLs- 
hif^ with decreasing iidercrHmecl dhtien.st(jiis ajul faster tran- 
sistors. Thits leads to voltage aJid carrent disturbances in 



lines tlial couple to adjacent met al lines ihrough mutual 
eapaeiti\^e and indutiive efTet^ta An example t>f an intercon- 
nect and circiul (opologj? that can cause these problems is 
showTi in Fig, 4, 

V^ry fa*it edge rales reciuire higli transient currents (lens of 
amiH^re^l from die off-c-hip and i>n-4^hip (Miwer networks. 
High currents are also present in the main c^hn-k netiiork on 
the ctiip. Power supply networks j-equire earefid design to 
nunimize inductive and c-apacilive ettects on v^oltage le\^eis. 
t^Ioek nets also need to rnamtain good \ oltage le\'els as well 
as minimize clock skew delays t>et\vet^n various bloc^ks. 

Solving Signal Integrity Problems 

DilTerent apiiroache*^ cai^ bv used to solve si^al integnty 
problems, hi general, conibiiiations of the following I ech- 
nit|ues lAere itsed on the PA 80()0: 
' Acljust spacing of signals relative to each other 
' Include shields above and below signals 
' Include restoring logic ( repeater) in the route 
' Design signaJ receivers that reject noise events. 

A key componetit of the (effort lo conect signal integrity 
problems is a toolset that can be used to identify ihem in 
the first place. This toolset needs the abihty lo do RC extrac- 
lion and die atnbiy to identify circuit topologies ihal nmy 
be ,susceptible lo n^nse ]uoblems. RC extraditjn allnws 
deternunadon of the extent of possible coupling jjrnhlems. 
By combining it with identification of susceptible circuits, 
solutions TO problems can be implemented. 

To identify circtuts witJi noise susceptibility, an existing 
inteniiil tool w^ls heavily modified and exlendt^d to allow 
ciLsy tra^ernal uf the current s( heniatit^ ur art work lu'ilist 
luerai'cby- TliLs tool could display all conjunctions of a given 
signal flcmii to the transistor level, intiuding inforntation on 
PET sizes iuid estimates of capacitive loadhig (from sche- 
matics) or extrac ted capacitivr load ( from rUtwork). Infor- 
mation on ( >oi1 I li rectional ity ar id ol he r 1 e x I p n i[ nn-\ i t 's ad f k '( i 
by the block designer could also be displayeil, as w^ell as 
what tenninals [)f a PET an^ conncctetl to the signals. One 
additional inipoHant feature of the tool was that it could 
track ;my ciumges in rt^al linn', a.s soon as \]\vy wine made 
by ch^signers. This Hiol vva.s used for many jMuposes by 
designers in addition !,o its use in noise checks. 
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Fig. 5, LMft-Ji I'iuhin^ cnnisi-^l l)y rniss-i-fiip nuisp. 

TliL^ laUliiiig nieLlKjfiolugy Lisi^cl on thf PA HOOO Ivd^ a ptiteiv 
tial failLU"e nioflp; t^xcnirsions of a signal beyond a sitipply rail 
(i\g.. bo[(>w Ifxal gttmnr! (or a givt'n latcli) could rriitsc tiie 
laldi to loso its valiir*. Aji t'\ani])le offhi^s is illusiiaioci in 
Fig. 5* The latcli shown is hoi cling a high value — node INl is 
at V[}th held by the weak feeciback invoiter. If I lie victim line 
vs at OV ajtd the (ntlpiit lines are at Vj)i) anti transition to OV 
(|)tickly. aj] exciirsini; or the \n[ni\ si;^nal heUtw local i^i'ound 
is [lossihlc. krdtiecfi by caiuicilive cnu[)lirtg frtan I lie rulirtil 
lines to the victim liiu' as the culpi'il Uiies trmisition iwm 
1 io (X Tliis input signal excnj^ion can cause the ii-chaiinel 
FET pass gate I hat setves as I be input t(j the latcti to luni oa 
even though its ^ate is held at OV fV'f;;^ tVjr the traiisisttjf is 
greater than Vjs)- ^his is becau.se die \1ciitu iijpul is lemjjo- 
rarily below loeal gi-oiaid. With this n-channel FET (jass gate 
on. tile lalcli can stauionsly fhmiji the vaUu^ it was hokhag 
l»y dischariiing tlie IN 1 nodi' it the transient is (^nougli to over- 
couu^ the I'eedback inverter and trii) die torward inveilen 
This tyi>e of failiae may change die statt^ ot the el tip, and is 
a serious problem that must be avoided. 

Other ptmsible problem circuits were also identified by this 
took including heavily ratioed-- couibinations ofp-t^haiinel 
FTH's and n-cbannel FETs ;ind long routes counecled Io gate 
in|aus of iiass FET lat cites. However, ddTusion couoecte^i 
inputs were the most t (amnon jaoblenrs. To itlenlify diffu- 
sion-comietteci inputs, tlu^ net list travet^al tool was iim on 
eveiy toi)-level signal in I he design. Tin- tool identified fcj])- 
level signals i-onnected to (he source i a chain of a pass FET 
in a latclL This gave a U^xtual ret>ort tit" all connections (knvii 
to the FET level for every tt^pdevel sigttal. in addition lo Ihe 
FET terminal connections and whedier the signal was an 
input or output of that par1i( ular leaf cell, 

(Ince the ret>rjil was generated, a parser aiudyztul die conner- 
livity to detennine if any signal connected to a FET iliOusion 
wwi also an inpiU {laitputs of k:^ar cells weie ignored). fHlier 
checks were performed I'oradtlititjnal suspect circuit toi>ol- 
ogies. Wlien potential problem signals were liletitified* the 
information wiis inlegratcxl w ifit IJC extniction results to 

' He-avilv ratioed corrtjtnations are combjriatiDns ot mvHTters ai^d other Hh in wtiich ttie 
RfferjfVfi p-j:hannBl FtT driwe strength is i;ignifi[:antlv ditterent tram the etfettive n-diannel 
FTT drive ytiBngtii 



deternune taitalties for tixmg signals, and the residfs were 
dislribnted t cj fiesigners to give them feedback on which 
signals in iheir blocks needed to be fixt^d. E.xlensive sinaila- 
tion.s sbcnved that ruTly routes longer than a specilled 
tiireshoid length would lu^etl to t>e fixed. This threshold 
gave designers a limit at which Ibey would have lo do sotm*- 
tMng to reduce susceptibility lo noise on a signal being ve- 
ceivetl i>y their lilock. 

hi most cases, designers used one of the techniques de- 
scribed above to alleviate these noisc^ inoblems. The most 
po[)ular solution insi4led a n*storittg inverter in fronl of tiie 
piiss gat(^ atid modilied the latch slightly in make it logicahy 
e(|uivaleril lo the latch that needtnl lo t>e replaced as shown 
in Fig. ih Tlie restoring inverter in front t)f tlie pass FET 
mnkes the late b far more immune iti noise (^venfs oi^ the 
inpub At other litiies^ repeaters (hiverlers and buffers) were 
hiseilcfl in routes to cut down tlie distance of the route, thus 
rt^ducing the siiscetitibility' tjf a given line ttJ transitions by its 
neigliboi"s. 
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Signal Integrity' Results 

c4iiiiinai:ing iiobo-mtliiriHl pl*HTric*a] failures in tlie PA H(KK> 
design, and probal:»ly sav€*d several months of chardcterizii- 
lion to invest igalp noise failures that would have existed had 
this tool not been developed. ( Ker 7\M> jMJtential pnjbk^nis 
were flagged witli the arst iiin of the too!. AH of these pmiy 
lem.s lA'iHe invc>*>ti gated imd either fixt^l 4>r waivered Jtefore 
tiipe release. The PA 8000 w^as a ¥er>' eleilrically rohtist de- 
sign gi^-f^n its eomplesiiy level wlien silieon was reeeived. 

One drawt>ack of this IocjI was Uial it was only run on top^ 
level signals. Since some of the bkn^ks on ttie PA SOCMJ w^ere 
ver>' larfiSts long routes and dieiefore noise problems eotild 
also be enibedtled inside blocks. ( *ae such problem was 
found during ehanieterization of tlie eliip at the blocrk level. 
We kwe ciiiTently extending the noise analysis tools to oper- 
ate at deeper k^\Tls throughout the ciiijj hierarchy to thor- 
oughly ciieck all signals on flie chip, RC extraction is being 
extended to allow deepei' k'^'els c^f exti*at tioii without long 
run limes, and mclnsion of indncti^^e effects is also being 
investigated, 

A limitation of this type of tool is that il can generate a lol 
of nois(\ tliat in. ret>oit problems tliat really arenl ijroljkMiis. 
This affects designer productivity because the pn>]jkMns 
repiuted by tl^e tool must be investigated. However, the 
penality and t^ost for lindhig a noise problem in a design can 
lie veiy high, espociiilly late in I he thaiadeiiziiiion i)njcess. 
so elTori spent early to elimiuale possible problems is veiy 
worihwjnle. We are ctirrently develt>|iiiig mnre adv^mced 
(ools lo elimitiate some of Ibis noise and make sure diat only 
[iroblems serious enough fo w^tn^ant fixing ai^e inchuU'd. 

Block Qtiality Cheeks 

Block flesign. especially for complex blocks, is a lime- 
consuming proctvss in whu'h — despite the best imentlrms 
ot llu' designer — probltMns can snt^ak through willKHil i>eing 
noticetl, P\)r this reiistyn sev(*ra] additional fonis were devel- 
oped to allow^ desjgnei*s to check for jjotenliaLly troublesome 
c*ircuits in their blocks. 



ihiQ tool che<'ks for so-eaUed "ugly" |>oh siHcon stnirtures. 
Gi\'en the resistance of the polysihcon layer in ihe HP pro- 
cess used to fabricate the PA 81MM), long polysilk cjti routes 
iire im<iesiral>le and can cause rmmeiTJUs probU^rtLs. chief 
iunotig these lieingslow spt*e<f St^idard cell roine<l t^hn-ks 
suffered less from this problem be< ause ilie routers em- 
ployed use<l only metal lasers ffir signal interconnect. U^ng 
polysilicon probk^ms oc(*itrrf^l priniiirily in seniicusicjm and 
fnll-eustoni designs. This 104 j! flagged jjolysilicon rontes 
between 25 and 50 micrometers long as warnings and over 
50 micrometers as errors. 

With tlie significant use of clfjck gaters to create many 
different flavors of clocks, bolli tjverlapphig and nono\'er- 
la]5ping. races were expected to he more pre\'alent in the 
PA 8000 design. Pass-gate blocks iti pailictilar caiLse these 
t^t^ies of problems. ClockHjualtned signals ( signals derived 
frcjm clock edges) driving olher ckjt k-qiialified nodes were 
checked to co\Tr signal races noi detectnbie liy the pnnious 
race checking methodology used in I^A-RISC jirocessor 
designs. 

Summary 

All of the techniques described abovr^ lielped to make the 
PA SOOtJ processor a succt^ssfid projett. achievitig its fi^- 
quency, perft>rmance, and aggreasive post-tape-release 
schedtik\ Tills was a great at hievemertt giver\ the sheer 
complexity of the ck^sign, the fact tliat it was a new ijroces- 
sor architecture, and Ihe mnnber ofnew teclmologies 
employtHl U) the design. This success is dne in large pai1 to 
Ihe design nielhodf>logies used for tins proccsstu; t^ajlicularly 
the new iiielliodnlogies developed for the PA 80tJ0 design, 
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Functional Verification of the HP 
PA 8000 Processor 



The advanced microarchitecture of the HP PA 8000 CPU has many features 
that presented significant new verification challenges. These include 
out-of-order instruction execution, register renaming, speculative 
execution, four-way superscalar operation, decoupled instruction 
fetch, concurrent system bus interface, and PA-RISC 2.0 architecture 
enhancements. Enhanced functional verification tools and processes were 
required to address this microarchitecturat complexity. 



by Steven T, Mangelsdarf, Raymond I\ Gratia.s, Richard M. Blumberg, 
and Rahit Bhatia 



Coiu J >yteT system ptTfomumte lias bf*en impioviiig itypjitly 
at a rate of 40 to 00 pc^tcent per year. Tliis growth rate has 
been i'^ieled r>y several ratlors. Advaneenients in integrated 
circuit lee]uKJi(ig.y have: iiuuie higher niieroprucessiir eloek 
rates and larger cat*[ies possitjle. There have been contribii- 
tionsi from system scil'tware as well such as compilers that 
eniil more efrKrieni iniic hine code to realize a given runeliori. 
The PA-RIS(" insTnieiioii sel archil eelnre tuts tnolved [u keep 
pace with elianges hi tt'chnotugy and customer workloads. 

These factoid alone, however, would not have been suffleient 
to satisfy ciistonier demand for increased performance in a 
very compel ilive industrj^ Tlie t>alance hits lii^en matlt^ up by 
iruuA'atious in mienjarctuteciure that increase the Mmiiiint ot 
tiseful work that a mieropnKessor perroniis in a ciuek cycle. 
Tl\Ls has increased the coinplcxity ot the design and thus tJie 
effort required for successful functional verificat ion. 

Many of our iirevious niicroiirocessoi' projects have reused 
existing cores (although generally with significant modirica- 
tions and enhancements). In contrast, itie HP PA BtlOO CPU 
h Lijs a ne w m icr oar ch i I e ct n r e t li at I m nro w s I i 1 1 le 1 Vo m i j r e- 
%ious projects. Sonre <j1 the teatures in it.s riiieioarehitectuie 
presented significant new \^erification challenges; 

• Out-of^>^le^ execution- A 5f>entr>' queue of pending instnie- 
lions is maintained by an hi.shtidiiUi rrordpr hujfrr ilRB). 
The queue hardware selects instnictions for execution that 
have their operands availatik^ irrespective of prograiu tjrder, 

• Register Renaming, \Vrite-alter-wiife mid ^Tite-after-read 
ordering dependencies urr eliminated by renuipping refer- 
ences tmm an ;iirhi lectured register lo a temi.jonuy regislen 

• Speculative Execution. The I*A 8t)ll(l preditts wliether a 
brmtch is taken and am tentati\ ely execute irtstnictions 
dowi^ the predicted path. The sitle elTeets of all such 
in.stniciions nuisi lie canceled if the piTdiction turns out 
to be ineorrec 1 , 

• Four- Way Supei^scalai- (jperation. The PA 8000 has ten 
functional imits imi\ c;ui sustain an execution rate of four 
instruciiims per cycle. 

• Decoupled Instna ti<Mi Fetch. InstnjeiitnLS LU-e fetched ;ind 
inserted into the queue t^y an autononious insirttrtion Jekh 
unit (IFU). The IFl* pt^rfomis branch prediction and caches 



the tai'get addresses of recruit ly taken br«in<:hes in a hrnurh 

target ad<lv(*iis cnrhe (BTAC). 
• C'onenrrent System Bus Interface. Memory requests can lie 

issued out of order, imd ttata returns can be aceonmiodateci 

out of order I'p to 16 requests can be outstiuiding at a time. 
» PA-RISC' 2,0 /\i'chitecture Enhancements, Tliesc provided 

imi)oilant new capabilities, such as (U-hit addressing and 

conqmtaliitn, Iml they necessilated tool rework nnd limiletl 

n^use ot existing test cases. 

This paper describes the enhanced rnnctional verification 
tools and processes tliat were required to address the daunt - 
hig microarchitectural compk^xity of the PA 8000. 

Verification Overview 

The puipose offuru tional verituatitni is lu iilentrfy defects 
irr tire design of a microprocessor tJiat cause its behavior to 
deviate from what is penirilled l)y the sjjecificaliorr. The 
sp et i Heat i or r i s the PA-R ISC' i rrs t n u/1 in n s e t a rcl i i tt^c t ure 
and tiie Ihis jnotocols established by industry standards or 
negotiated with the designers of other system comporrents. 
Perlornrarrce s|)ctillcatioirs, such as insimction srhetluhng 
guidelines eorrmrifted to the compiler developers^ may also 
be {'un side red. 

Although it is not possible to prove the correctness of a 
irricropr'ocessor design absolutely thi'ough exhaustive simu- 
lation or existing toiriUil veriOcation techniques, a linrctional 
vertllcation efforl must achieve two (hltrgsto be coirsidered 
successful. Firs! and foremost, it nrust pro\ide high confi- 
dence that o in' ] J rod nets will mc^t the quality expectations 
of our customers. Ai the same time, it must identify defects 
early enough in the design cycle to avoid impacting ihe 
product's tune to market, 

A tyiiical defec^t caught eaiiy in the design cycle niiglit cost 
only (ine engineering day to delmgand t onect in the HTL 
(Register Trmrsfei- Language). V\os\^ \o I ape release, it irright 
take live to ten days to modify traiisistor-level scirematics 
am! layoirf . modify interblock rtniturg, ;md repeal tirmug 
airalysis. ThtTet\tre. tape releLrse can he delayed it lire defect 
rale is not driv en do\^Ti quickly 
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After tape rel^ise. lost i-alendar time b the priniar>^ cost of 
defend be*-aiLse the time retii tired to fabriraie a new revision 
of the design is m best a few weeks and at worsl a few 
montlis. Defects ihat are s<j severe iJiat they block a soft- 
ware partners developmeiiL tiiniiig. or lesting effoils ran 
put them on the critical schedule patJi of the prmlnrt. The 
worst-case scenario is a masking defect that blocks ftirther 
testing efTfirts for a certain fimctional area of the design, and 
this dela>-?> th(^ dist-overy of addjlional defects by the Oini* 
re<iuired to fabricate a new reviision, ( *ne or more masking 
defects in series can quickly de\iistate ihe producl schedule. 

The PA 8000 verification effort consisted of a prcsUicon phase 
and a postsilicon phase. Tlie purjiose of the presilic on phtise 
was to FirHl defects concurrently with llie dt»sign, vvhen Ihe 
cost of conectuig them was smaD. am! to flri\e up die quality 
level at first tape release so that the first prototypes would 
be useful to our software piutner's. Tliis was done using 
three tactics: RTL simulation, acceleralcil simulation, and 
switclelevci simulation. Thcpostsiliton effon < onsisit^d tjf 
aggressive characteiizatiun of hardware protot>pes lo com- 
plete veiification betbre systetns %vt=re shipinnl to cu^stomeiTii. 
Also, perftjrmance verification was done at \'arious stages in 
the proje<-t. 

RTL Simulation 

Most previQus PA-RI8C^ microprocessoi projfxM?? lia\'e inilll 
iheir functional verification efforts art^uttd m\ intertKilly 
developer! RTLsiniiilator thai troitipiles RTL descriptions 
of blocks into C code whirh ate llu'tt coniplled with HFs 
(' compiler. Bloc k i^xtnnilion LsBchethikul dynamically using 
an evenl-cltiven algorithm. This sijmiiaticm tet imnlogy 
achieves modest t)er(V>niiatK'e (about {X^^ !lz running on it 
tyt>kal Wfiiksi<t1ir>n), hnl it does jimvide tajjahilities f</r rat od 
protuiyi»iiijt; surli as the abihty to simuhite verj^ high-level 



RTL and quick minM builds. Tlierefore, our RTL sinmlator 
bt*came the ct)mersione of our \'eriijcation effort earJy in 
the design. 

Fig. 1 show^ the verification enviromnent med for RTL simu- 
lation. There are fom- !>asic (^oiniK)nents in tlie environmentr 

• Hie RTL model for ihe PA J^HMi 

• Bus emulators, which can apply inieri«5l1ng stiinulus k* the 
in(>ut buses 'if the PA SOTHJ including rt^si>onses to its iranfr 
aciions. We inchuJeti emutaif^rs for all coniponenis stiaring 
the sjTsiem bus including the memor>^ system, I/O adtif*ter, 
and third -paxty ijnM^esstjrs. 

■ C betaking software, wlijch monitors the behavior of the 
FA SCMKl and verifies that it complies with the specincations. 
Tins also helt>s sjjeed debugging by flagging behavioral 
violations as S(»on as tliey <:iccur 

• A variety of test case sources aitd tools thai c"an conii)ile 
the test cases into an initial state for tlie PA 80i}(J model and 
configure the bus emulators. 

Cheeking Software 

Tlie most unportan! check is a thorough coint>arison lietween 
insinictions reining in the PA BtJOti model mid instructions 
retiritTg in the PA-RISC archhectiiral simulator. Rethinti 
means exiting the in^^trtution reorder buffer, or IRB (see 
article, page 8). A tool called tJie f/ep/pe?-captitres informa- 
tion about each instiiK'tion retiring in the PA 8dt)0 mode!, 
inchiding wliat resources (such as destination registers) 
are being modified and the new values. The synt hn/nizer 
compares this with similar information i^biaitied from the 
PA-RISC architectural sinmlator which is also rumiing the 
same test case. This jirovides \er>- liigh confidence that the 
PA 800tJ complies with Iht- tjasic PA-RISC instruction set 
anhitecture A tlnabslab^ (oioijarison of all processor and 
nu^moi^ stale infortnation is also done at the end of each 
test case. 
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Thr dPiJiiKTiilRf! pnnidrH Iho symhronizpr vviih infoniiatiiin 
aboia anliirofinrally ri'iuis]j;m^nJ events suth as cat' he 
niisses. 1 1 sing this infom^at!i>nt the syiiclironizer cmi peifomi 
strong rlipcks In thv aroais (jf t af hi' rryht^iTiuy, tneirtf>t>' 
access onJcriiig consisloncj. and nu-niory-to-t arht^ (ran.sfefs. 
In addition, ii nuinliri ofdn'r ker?^ were devehj|><Ml tor other 
areas: 

• A checker for the irislnKlion qafvics, inrliiding whether 
the ordw hi which iiislnicnoits are seiit to furu tional units 
( ■ t >m pi i es vvi I h d al a d e |><* 1 1 d f m k ' i en 

• A eliecker for protoeol \iolal ions on lJie system bus 

• A checker for the busmterf'ace hlork, disci Tss<»d in laorc 
detail below 

• A dieeker lliat deteets unknown (X) vahies nu iii(cnial 
nodes. 

Test Case Sources 

A rest case is e.ssen(ially a \vs\ progiani to lie nu\ ihmugh the 
HTL model of I he processor to stress a panieular iuea of 
functionality. 'Fliese are generally wrilten in a format similar 
to PA'RISC" assembly language, with atuiolatioiis to lielp 
specify initial cache and TLB contents, hi addilion, a control 
file can be attaclied to a lest case to sjjecify the behavior 
of t lie bus einulalors. The emulators have useful default 
liehavior. but iT desired the control Hies can precisely 
conlrol I rattsacii nn liming. 

A test case is compiierl using a collection of tools ttiat in- 
eludes the PA-RISC' assembler. The lesuH of fhe comt'ik^ti^^n 
is a set ofstah^ initializations for the KTl, model These 
itichule the ijrocessor registens, caches j TLB, anti tnemoiry. 
In additi<m, the bus emulators are initiaiizetl with the com- 
mands they will nse during i xecufitni of the tesi case. 

Previous PA-RISC microprocessor projects had buill ujj a 
libraiy of test cases and anhitectural verification i>rograms 
(AVPs). AJthiaigh we did run these, it w^ls clear from the 
beguining that a large sotirce ol' new cases would be re- 
quired. Tlie existing cases wtne vtny shr^rt, so their ability 
tr> prcnide even at <'i denial t overage tor a machine with a 
5t>e 1 1 1 ly 1 R B wm^ t j u es ( t o n a b t e . M f > rt ■ ( > vei; w e nt^ef 1 ec 1 c iLse s 
that tmgeted the unitiue micro ai'chitectural featmers of tlie 
PA 8000, 

We developed a tt^st case template expander to improve 
our productivity in generating the large number of rases 
reuuiretf An engineer could write a test temt>laie spi'cifying 
a fundamental inieractirai, and the t<iol would expaiul this 
into a family ottest cases. Some of the featui'es of this tool 
included: 

• The abilhy to sweep a t>arameter' value. This was often used 
to vary the distance Ijelween two interacting instinct ions. 

• Tlie abiiily to 1111 in an unspecified parameter with a random 
value. 

• An if construct, so that a choice betw^'^en two alternatives 
cf>uld be conditional on jjaramtlers aheady t hosen, 

• histruclion gioutjs. so that an instruction could be specified 
t hat bad certain charactenstjcs without specif^'^ing the exact 
instruction- 

We also ased the pseudormidom c*>dc genet ai or and test 
coverage nieasuretiieiU techniques discussed l.ielow in tbt^ 
HTL simulation environment. Tfi imtirove oiu' cf>ver^ige of 



multiprocessor functif>nality, we configured our bus emula- 
tors tt> gentuale raiidoju [but interact uig) bus traffic, 

S t rue t lira ] Ve ri fi ca ti on 

A bkiek can be des( jibed l\iy a single laige RTL[>rocedure 
or by a schematic that shtm-s the intercormcf tion of several 
smaller blocks, each of which is descnbed by RTL M the 
bi^ginning of the project. RTL tends to l>e written at a high 
level b(^cause it can sinuilaie faster mul is ea.sier to wrile, 
debug, an<Unainlain wlun the design is evolving rapidly. 
Block dc^signei^, howev^eft have a need to create sciieniatics 
for their tdocks, so there is a risk that these will diverge 
from the KTL reteieiice. 

We considered three strategies to verify tliat the two reprt^- 
.seitlaticais of the block wvvv eciuivalent. The first, formal 
verificationi was not ijm's\ied tjecause the retjuired tools 
were not yet available from external vendors. The second 
strateg.v was to rely on the switeh-le^el verification effoiL 
This was unatlractive l>ecause defects would be found ujo 
late in the design cycle, and the phmned number of vectors 
to i><* nm might not have tjrovidefl enough coverage. Tlie 
strateg.y selected was to retire the higher-level RTL tk^scrip- 
tjon and replace it in the RTL model witii the lower-level 
represi-niation. The nicjre timely ajid thorough verification 
tbiU llvis tirovidecl compensated tor some disadvajitages. 
including slower simulation and more difficulty in making 
changes. We also used tliLs strategy select ively, relying on 
s\vilcli-le\el simulation to cover regular blocks such as data 
[mths whh httle risk. 

Diviiie and Conquer 

In any large drsign i^fforL one faces a choice of whether to 
verify t onutonenis individrially. together, or both. Verifying 
a compone[U sepfuately has s^i'veral potential advmitages. 
Sinmlation lime is greatly reduced. Input buses caji be di- 
redly t (jntjollerl so effoil need not be expended maniimlat- 
iug the larger model to pnmde interesting stinuilns, Ftnaily, 
dependencies I ) el wee n subpr ojects ai v eliminated. 

For set>araie verification to succeed, the interfaces to other 
coniiJouenLs nuisi be very well-specified and cleaily docu- 
nwnted. Investments must be made in a t:est jig to provitle 
stimulus to the component tuid in cheeking sob ware to 
verify its outjjuts. hi additi<m^ some jKsnion of the verifica- 
tion must Ik* retreated with all contponent^s Integra ted to 
guai'd against tnrors in the specifications or different inter- 
pretations of them. 

The PA SOOO's bus interface block was particulaily wt^l- 
snili'd to seijarate verification. Tlie block had cleiui exteniiil 
inlerfa<'es but con tabled a lot of complexity, including the 
hardware in manage multiple penchng memoiy accesses. 
A software checkhig tool W'^is written to monitor the blocks 
interfaces and verily its operation. Ciiet^ktng that a request 
on f>ne bus ultimately results in a tnmsaction on the other 
ims is a simple example of runnernus checks performed by 
this tooL A vety low defect rate demonstrated tlie success 
of the divide-and-conquer strategy- tor this block- 
Most of oiu" remaining verificatioti effon wiis focused on the 
complete PA 8000. As a fuTal check, a sysiemdevel RTI* model 



24 August 1 \¥A7 !tt^wi(.'(t'Pa<'kan1 iniiinal 



)Copr. 1949-1998 Hewlett-Packard Co. 



was built iliai uK^lydcHl st*venii prwpsstjre. the memory con- 
trolkT, tht* I'O adajiier and other t'omponents. Although 
llinnighpiH was ver\ k>w. hasir inteiiirrions helwetni I he 
c'(jriiix*nenis were verifiecl using iliis oiodel. 

Accelerated Simulation 

Tliv^ siired of I he KTL sbiinLili>r w;is atltHiuiite in jirovitle 
quick feedback on changes and for basic regression lesllng. 
but we lacked confiflence that on a design im com jj lex as f he 
i'A 81 XM) il woiiki be sufficient to (k^iver an aikntuate c|Utihty 
level We saw a strong need for a siniulation capahiUty llial 
was several orders of magnitude faster so thai we cimkl 
nm enrnig?* tesl cases io ferrel out more subtle defe<1s. \V^^ 
considered 1 wo technologies to pn>vjde this: cycle-ba*ie<i 
siniulatiou anti in-circuit en m kit i oil 

t'yi k^-based simulation provides a much faster so ftiivare 
shimlation of the design. With an event -dri\'eii simulator 
such as our RTL simulator, a signal transition causes all 
i>k)cks that the signal di'i\'es to be nH^xecuted. and any tran- 
sitions on the outputs of these blocks are si ni Marly [jropa- 
gated until all signals are stable. The o\erhead Io j process 
evety signal transition, or event* is fairly high. Cycle-biised 
sinuilators greatly improve perfomutti("e liy elijturiating this 
ovt'jiu^ad. The (k^sigji is compiled into a kmg seqiieiu'e of 
Boolean ofieralions on signal \ahies ( ANO, OR. elcj. atid 
execiitk>n of this sequence sititulates the oijeratiotr of the 
logic in a clock cycle. The name cycie-hased simulator 
comes Finn I Ihe fact that the signal stilt e is only comptited at 
Mve (H\ds of clock ryck^s, with no attempt to simulate inter- 
jiiediate timing infornvation. Om' investigation revt*nleil that 
speediit>s of 50(J tint's were possible, so a sintulalion I'ann 
of ]t)(l ftiaehtnes t^t^tikl have a Uiroughptit cm the order of 
25.0tMt ][a. The biggest <irawback of this strati-gv was that 
cy( le-ba.sed sinutkitors were not yel available from external 
vendors. 

Willi in-cin'uil emulatitni. the gaU^s iti a B<Kjk'ati rejiresenta- 
tion of the design are mapiied onto a rectjufigurable iu-ray 
of rn^ld-prtjgratnmable gate arrays (FPtiAs). The rlesign is 
essiMilially Imill itsing KPtiAs, atid I he enmlati^d |jroeessor 
is connetled lo tlit' prot^essor six'ket in iU\ actual systeiri. 
The clock rate of the emulation systtnu is on the (U'dei^ of 
riO(l,OOtl Hi^. so vt^ty^ high tesi thnaighijul is tiossitiie. It is 
eveti i)ossibk» to bool I he otieratitig system. I tirorlutiately. 
thert^ were many issues invoKiuJ in iisini^ iti-cticiiii t*mula- 
ttou successfully: 

♦ Ciistoni t»nnti»d circuit boards would havi^ \a lie designed 
for tlu* cachets, large register bles. atid iuiy other regulai" 
stiiictures that t onsiiiae tot) niiich emu la? ion capacity. 
Changes in the (k^sign woukl be difrictilt tcj acconunodatt\ 

• A systetn \^'as needed to exercise the enmktted jjrocessor, 
itichaling a memory* controller mid l/(') devices. Firmware 
lUid haidware litikering woiikl have betii tuvded to tuake 
this syslem fujiclional at the slow c1<h k rates rei|uiii'd by 
the emulation system. 

* Pitiductivily was reducef! hy long comtnle times and limited 
obser\'abilhy of inlemal signals. Only ojie engineer at a tirnr/ 
couhl use tlie sysbin for debugging. 

• The strategy* was difficult to exti^nd tf> imtllijirocrssot 
testing, h was prohibitively expensive to emulate luuliiple 
prfjeessors. Wc planm^d \it use a software emulatiit to 



create thinliJarty bus tr^iffic and verify ihe prfic(*ssor's 
resj)onst*s, i^ut there was a ri-sk Ihal Ihe soft wane's i^erfor- 
inaiu^e wouki thr*>ttle Ihe emulaliort sysiem s ckM-k rate. 
# The emulation system wiis a vety large ca|:iila! inve*^nient. 

We were qtdie war> of in-iuiuii emulation stncc» its use on 
a pre\1oas i>rojeci had faiied to make a significant i^oiilribU' 
tion to fimctjonal verification. We were aLs4> wdlitig lo give 
up the |;erforniance adxsintage nf in-cirt*uit emulation to 
avoid tiU'khng tlie ease-of-use issuer, Tlie diHision to use 
cycie-liasedsimuhttion would have been stmjde except that 
it meiint that we would have to devek»ti the simulator <»ur- 
selvt^. EM) or^anizalions in HP are challenged to focus on 
areas of core compel enty and lf*ok lo exientJil vendors to 
ftilfiil needs such iis design tools tliat are cotnmoii in the 
i n d list r>\ \\\ < 1 1 i d st * le ( t eye J e-btuset i si m i d at i o ii I >e* at ist^ we 
were confident tliat its lower risk and higlier productivity 
would translate into a ccmipetirive advantage. 

We were careful to n^use components wherever pf sssible 
ar*d to litnit the scotje of the project to providing the loot 
t'mu tii>Jialjty renuirtHl to verify tJie F^A K(H)(1. We did not 
at tern t»^ to create a sinmlaiion product riseful to other 
groups within IIP This tiuiied €>ut to be a good decision 
because comparable tools have recently started to become 
availablt* frt >n^ t^xtenial \ endors. 

Cycle- Based Simulation Conipiler 

Tlie lycle-hastHl sinujlation ccjmpiler nt>erates imly rm siiiipk' 
gate -level [jri mi lives such as logic gates mui lab^hes, so 
hii^hcr-li'vel llTb must Inst be synthesized uito a gatt'^-level 
i'qutvak'nt. We had lo develop our tjwn translator for this 
because the RTL language used by our RTL sitmdator 
was defined before the industiy statidardi;fi*Jtk)n of such 
laiif^uages. Atiother simplificatiun is lhai signal values juv 
limited io (Mmd 1, vvitli tio attempt lo model an imktujwti (X) 
slate. 

Fig, 2 shows a simple exiiinjde circuit, a two-bit coiaiU^r, 
lhal we will use to illustimte Ihe compilation process, llw 
user must dtrstiihe \o Ihe t-ompiler infonnation about the 
cirtiiils clocks. Tin* clock cycle is brrikeii tluvvii into I wo or 
more j)hascs. wifh Ihe stale i»f the clocks fixed during each 
phase. Tins circuit lias a clock cycle ot Iwci p hastes, and Ihe 
(*lo<'k (CtK) is low thiring the first ])hase and liigh tinting the 
second phase. 

The compiler uses this in format !f>ri to (k^t ermine which 
gates need lo be evaluated dm iug eat h |1has(^ This is done 
in two steps. First, for each [>hase, the cinntaler priypuf^afes 
the clock values into the circtiii. This ust^s simple rules of 
Booleati logic, such as the fact that the output of im AMD gate 
v^'itli a zero input must be i?ero. "Uw f*tjal is to idct\ti!y IntxOtes 
with a zero cinnrol, which are Ihercfcjrc jirovably upaque 
(hiring that t>)iase. Next, iii^ain for each jjhase. tin* ctJinpiler 
fntds all gates that run he reached frtmi a clock 4>r (jtlier 
ittjnit thrrjiigh a [ikHh tliat does not contain an opaque latch. 

Nexb a seqtiencH* of Boolean operations is ennttetl f:on'e- 
s[3r)rulin^ lo I lie ^alt\s iti each |ihase. Btn'aitse we used PA- 
RlSt' niachines lor situnlaiion, the setinences were aclnally 
«ml[>ti1 in PA-RISC assembly language- The se(inenci\s 
totaknl itiort^thiui two nrillion instnictiotis for the PA 80(11) 
design. 11 u* ^ates are ortk^red iti sectuerice so ifiai a gate is 
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Fig- 2. Cycle-based slniiilatioji <.'onipilatksiE exaniple. 

not eniitted until its inputs have been con\puted. Cycles, or 
loops, ill the circuit ai'e hanciled by looping tlirough the gates 
in the cycle until all circuit nodes are stable. 

Numerous optimizations are done on the output assembly 
language sequences; 

• Clock signals have known values diu'ing each phase, which 
can be propagated into the circuit. These constaiu values 
can simplify or eliminate some of tlie Boolean operations. 

• Hie 32 PA"RISr registers are used to minimize loads and 
stores to memoiy. Boolean operation scheduling and \4ctim 
register selection are employed to minimize the number of 
loads and stores. 

• The compiler can determine wiiich circuit nodes carry inJbr- 
mation from one phase to the next. The remaining nodes are 
temporaries whose values need not be flushed to memoir 
after their final use whliin a phase. 

• To eliminale NOT operations corresponding to inverting 
gates, the contpiler can represent nodes in inverted fonn 
and perfonu Deniorgan transformations of Boolean 
operations (e.g., NOT- AND is equivalent to OR-NOT). 

• Aliasing of circuit nodes is done to eliminate code for 
simple buffers and inverters. 

Any one of tlie Boolean operations in the output assembly 
language sequence operates on all 32 bits of tlie PA-RISC 
data path, as shown in Hg. 3. We make use of this parallelism 
to run 32 independent lest cases in parallel This is possible 
because the simulator always executes exactly the same 
sequence of assembly language instructions regardless of 
the test case (assuming the circuit being simulated is the 
same). This does not reduce the ilnie to solution for a given 
test case, but it does increase the effective throughput of the 



simulator by 32 times. This was stiE veiy^ useful becavLse our 
verification test suites are divided into a vast number of fairly 
short test cases. 

The compiler allows the user to write C++ behavioral de- 
scriptions of blocks such as memories and register fUes that 
are not efficieni lo represent using gate-level pnmitives. The 
compiler automatically schedules the caUs to this C++ code, 
and an API (appUcation progranuuing interface) gives tlie 
code access to the hlock s ports. 

Pseudorandom Testing 

We had learned from previoiis projects that the l>7je of 
defects likely to escape the RTL simulation effort would 
involve subtle interactions among pending instructions and 
external bus events. With up io 56 instructions pencil ng in- 
side the processor and a higlily concurrent system bus with 
multiprocessing support, it is not possible to count — much 
loss fully test — all of the interactions that might occun We 
beUeved that the value of hand writ ten test cases and test 
cases randomly expanded from ten^plates was reaching 
diminishing returns ^ even witli the low simulation through- 
put achievable with the RTL sinudator 

We had also learned that pseudorandom code generators 
were a very effective means of fmding these kinds of de- 
fects. Such a program generates a pseudorandom setiuence 
of ijistructions that use pseudorandom memory addresses 
and pseudorandom data patterns. However, it is iniport^it 
that the program make pseudorandom selections in a man- 
ner thai considers the microarchitecture of the processor 
and the kinds of interaction defects that are likely to occur. 

Selecting memoiy addresses is a good example. Memory 
addresses are 64 bits wide. If they were selected tndy ran- 
domly, reusing the same address within a test case would be 
an impossibly rare event. This would fail to stress important 
aspects of the machine, such as the logic that detects that a 
load is dependent on a preceding store with the same ad- 
dress. There are hundreds of selections that a generator 
makes in w^hich the microarchitecture must be carefully 
considered. 

We chose to target the cycle-based simulation environment 
for a ne^v pseudorandom code generator Our pseudorandom 
code generator was carefully tuned for the microarchitecture 
of the PA 8000 and included support for die new PA-RISC 2.0 
instruction set. Hundreds of event probabilities could be 
specified by a control file to provide engineering control 
over tJie types of cases being generated. 



f* 



Source Regisiers 



} 



Oestinalion Aegisier 



32 Simultaneous Simufatiiihs 
Fig. 3. Multisiot qj''d.e-based simulation. 
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We also chose not to |xnl the heh set of ehet^king software 
from I he RTL simulation en\ironnieni lo the eyrie-basecl 
siniulatlon environment bet-ause of the efTort involved and 
risk that perfomiaiice would be redtu'eil ( if^neraiors such as 
our pseudorandom code genenilur pre<iiet ilie fmal register 
and menior\' state of the processor ajid defects will generally 
manifest themselves as mismatches Ijn^tweeii rhe simulated 
and prrtlitntHl final slate, 1 1 is passible that an enxjr in stale 
mil be overv^Titten before the end at a test case, hut a de- 
feet wofvl be missed unless this liappens in everj" test ease 
Ihal hits it* which is extremely imUkel^^ statistically. Our ex- 
perience with liardware prototype testing, in which ititemal 
signals are unavailable ajid all checking musi be done 
throngii final state, also made ns confident in this strategy 

Cycle-Based Sunuiation Environment 

Fig. 4 shows ihe cycle-based sinndation environment, whicli 
wM be described by following rhe life cycle of a typiciiJ test 
case. The job controller controls the 32 independent siiinila- 
tions that are running in Ihe data path positions, or slots, of 
the cycle-based simuh^tion nu»deL It stansand entls test 
cases in the 32 sloths iittlependently. It is control lefl l>y a 
UNIX'- sheilp whieli is driven either by a script or inter- 
actively for debug activities. 

\^Tien a slot becomes availai ile, the controUer commands 
the pseudorandom code gtMierator to generate a new rest 
case, occasionally first reading a new control file. The test 
case is sj^ecified by tlie inkiiil state of memorv' imtl the pro- 
cessors registers, and I he pseudorandom code generator 
specifies the initial state of the caches as well to prevent 
an initial flurry' of nusses. 

The pseudorandom code generator downloads the initial 
stale nf the sitnulation itito various components of Ihe simu- 
lated mudei. Thes(^ iu( lutie the gale-level model, behMvioral 
models representing caches, register files, antl other rej^ular 
stnicttires, and emulators rejjresenting bus devices such as 
the memory system. I/O adapter, mul third-pariy [)rocessors. 
The motiel is then slet>j>etl tV>r Jitunerotis clock cych's luitil a 
break]^oint trigger fires to iudirate the end of the test case. 
The pseudorandom cfxle getierator b then commanded to 
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extraci the relevant final state fmm the sitnulated mcMlel *^ind 
compare it wttli the final stale that it predicted to determine 
whether the test case passed. 

We used a simulation farm of up to UHl desktop work- 
stations and seners for cycle-based «;iitiulation. Jobs were 
dispatched to (hese macliines under the control of Hi' Task 
Broker^ E«itlT job ran several thousand test cases that were 
genenited usiitg a specific pseudoraniioin code generator 
control file. 

Multiprocessor Testing 

Multiprocessor testing was a key focus area We wrote emu- 
lators for addilionaJ pixicessors and the VO adapter which 
share the memtjr> bus. It was only necessary to emulate the 
funcfiouahty required to initiate and respond to bus trans- 
actions, but the etnuiaiors were accurate enough that defects 
in tiie proce^or related to cache coherency would manifest 
themselves as misniarches in the final nieniory^ state. 

We established linkages with our pseudorandom cotie gener- 
ator so that the entulatom woukl be more effective. When a 
test case stalled, the i>seudorandom code generator down- 
loaded control nie infonnation so that parameters such as 
transaction density and reply rimes could be easily varied. 
The pseudorandom code generator also downloadetl the 
memor>^ addresses used by ttic test case so i hat the enniia- 
tors could hutiate transactions that were likely to cause 
interactions. 

Coverage Improvement 

lmpro\i[ig Ihe test coverage of om pseudorandom code gen- 
erator was an ongoing activity. The pseudoratidom code 
generator lias hundreds of adjustalHe values, or knobs, in its 
control file, which cmi be vaiierl lo foe its the generated lest 
cases. We fouud that the defect rale quickly fell off when all 
knobs weie Itvtt at their default settings. 

We used two tactics to create more effecrive control files. 
First, we handcrafted files to stress pmlii'ular fiuictional 
areas. Second, we generated fik^s using ]>seu<lorartd(im tech- 
niques from ftnviplalcs. each template specifying m t>ai1icul!U' 
rmulom distribiiltoti fr^reach knob. We b>uud with both 
strategies that it wa.s iinponant to iiumitor the quality of the 
files generated. 

We did this in two ways- P1i^t, tnir pseudorandotu code gen- 
erattjr itself reported statist lt*s on the test cases gerteraled 
with a given control file, A good example is the frequency of 
trat)s. 'IVat^s cause a large-st ale reset inside the |)njre.ssor, 
iiuiuding fiushing the instruction queues, so having ick) 
many trs^is in a ease effectively sboitens it and reduces its 
%'alue. We made use of instrumentation like this to steer the 
get deration of control tlUss* 

Feedback is often needed based on events occurring within 
the processor, uhich our jiseudorandoiu ccjde gent*ratt>r c an- 
not generally predict. For example, im i^ngineer miglil iu^mI to 
know how often the maximum number of cache niisses are 
pending to he confident thai a certaiti area of logic hiis been 
well-tested. Test case coverage mialysis was acconiplislu^d 
by an add-on tool in I lie sinuihuion envirorunent. This loul 
inclutkHl a Inisic hutguage that allcnvcnl tnigineers \a desciibe 
events of interest using Boolean equations mid tinilug delays. 
Tlie list of evenis <'oijld int ludcMhose that were expecttni in 
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occur rfgiibrly r^revcii thost^ that a designer never expect e* I 
to occur. Both ends of tliis spccmnn could provide useful 
mfonnaiiori. 

Once the t^vents were deOned, tlie adf l-on tool provided 
monitoring capabiUtJes during I he siiuiilalion. As test cases 
were nui, the to(.il would geueralo oulpul cvc3r;v' time it de- 
tected a cicliru'd event. This oiilpu! was Mumi ])ustpn>cessed 
and iisseinbk'tj iiUo an event datahase. The event dutaljawe 
could contain resullis of thousands of test case nms. Event 
activity rcfjoils were I lien generated fron^ this event flat^- 
liase, llies(^ repojl^^ iiu'hided statistics such as frequency of 
evenis, duration of evenls, tlie average. nuLxiniunv. mul mini- 
nuuu distance^ 1 jet ween two occurrences of a invent, and so 

Tlie event activity reports were tlii'!i Annly/jnl by t^ngineers 
to identify weak spots in covrrage and pro\ir]e IV ed hark hi 
t hc^ ge m ' rat i oi \ o f c j a u ro 1 11 1 es. This i n e I h u d o 1 1 j gy \n<i vU \ et 1 
one othtn- benefit as well. Vov inauy funt lional defects, espe- 
cially ones that were hartf to hit, the conditions required 
to manifest the tiefert were coded and defnied as aji event. 
Tlu*n this add-on h>oJ was used with a model that crmtaint^d 
a fix for I he defet I in prove thai the condhions retiuired fur 
die defect were tieing generated. 

Switch -Level Simulation 

Di ty|3ieal ASIC design TTiethodoTogies. an RTL description is 
tlie source code for I lie design, ajid tof jIs an^ used to synthe- 
size iransi st or- 1 e v e I se 1 1 < * i u at i cs an d K ' 1 ay o u t me c hai lie a 1 ly 
from the RTL. Veritying the eqth valence of the syntliesized 
design and the RTL is hugely a tonnahty to guard against 
oc casionaJ t tiol f 1 efeet s. h u he ! h K e i i.st < >u i n i e ( h or ! < ilogy Lised 
on the PA 8000. however, desigiu'is handciaH transistor- 
level schematics to optimize clor^k rate, the area, and jjower 
tlissipatiorL Therefore, we needeti a methorlology fo prove 
efjui valence of the handcrafted schematics and ihe RTI^. 

At the tinu* the project was nndrrt^dveu. formal veiificalion 
ttjols to prove this eifui valence were not availaltle. lust cad, 
we lunied to an internally developed switeh-levi<l sinuilattir. 
Alitiougli unich slower than the RTL simulator, tlie switch- 
level siruulatf^r included cssent iai featrnx^s such as the abihty 
ttJ mf)del hidiret tir.jual transistors, variat>le drive strengths 
and varial)le charge ratios. Thanks to this ctueful effort m 
switch t eve 1 veri Heat ion on the F^A 8000. not a single defect 
w^as fotmd on silicon dml was related to a thffertmce between 
the transistor-level schematics anil the RTL. 

Veriilcation was performed by pioving that a bh.K k behaved 
the same when iiuuiing a test case ui the RTL .simulator and 
hr the switch-level simulator Fii-si, a tulUhip Ifl Lsiiuulatiou 
of a test case was done with the [>oi1s of a bh>ck monihaetL 
These vectors wert* then turned into stimulus and assertions 
for a switch-level simulatitm of the block. Initializing the 
state of the block identically in the two cmvironmenls was a 
challenge, t^sjiecially since die Merarchies atid signal names 
of the RTL anti schematicrepresentatioiLS Vim differ. 

Initially. tJiis strategy was used to tm-n on die switch-level 
simulatoi- models of individual blocks on the chip. This 
be It red to distrilaite the debug effoi t and quickly bring 



all blocks U{) to a n^'isonaf>le quality levt^L AfttM-wanL the 
focus shillcd u.i fiilbchip switcii-le\el simulator veriUcatJon, 
In addition to collecting vectors at the ports of the ehitJ* 
thousands of itilernal signals wcTf monitonnl hi Ihe RTL 
simulation and iraiisfonued inio asseitiniis fnt the switt h- 
k^vel simulation- These were valuatile for debugging and 
raising om coulMlence (hal there were ua snhlle bebavjor^it 
differenc^es bet ween the two models. 

Tbe RTL simulation effoii w^ts a i)lentiful source of test 
cases, but Ihey were target eti a( fiinciional del'ecls rather 
than )mf>lementalion iirfjjs, and tbe slower spceti of the 
switch-level simulator allowed only a portion of them to l>e 
nni. To imi)i'ove coverage, the process shown in Fig. r> was 
used at thp block leveL The RTL description for t he block 
was e on verted i n to an e( 1 1 1 i V a I i ^ n ! ga I e d e \ el I u od e 1 usin^ 
tools developed for cych -tvast d sinnjlation. Automated test 
gejteratioit toolsn nonually used later iti the project for man- 
iifaciurit^g, were Iheti used to create test vectoi^ for the 
gatedevel tnotleL If the swiUdi-k^vel simulation using these 
vectors failed, then Ihe two represt illations \vvrv known 
to differ. While tlie auloniate^l tc^st generaticju tools do nol 
generate perfeet lest vectors, this process still pmve<l to be 
a valuable source of additional coverage. 

Tlie switch -level simulator aJso supports several different 
kinds of qualhy checks. These inchide dynamic decay 
chcHking to tlc^tect undriven nodes, drive figlu checking to 
detect wlien multiple gates are driving the smne node, and a 
race checking methodology. This was implemented by alter- 
iitg how the clock gi:neratDr circuits were modeled to (^reate 
overlap between die dilfertMil clocks rai the chit). Failurc^s 
I hal arose bom ^iverlapix'd clocks i>oinled to iiattis reQuiring 
detailed SIMC F siniulutioTvs to piove thai the race ci>uld not 
occur in the real circuits. Reset sinnilations were done from 
ramlorn iihlial states to erasure I hat Ihe chip wonld power up 
|)roperly. Finally^ a swil(!h-level sinurlator model was built 
horn ^ntwork net lists to throve thai there were no mismatchc^s 
between the ailwork mid die sche!nati< s I fiat wer'e missed fiy 
(*ther tools. 
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Postsilicon Verification 

Presilicori verification tef'hiiitpi's are ailetinale xo Vmd jieaiiy 
Jill of thp defeits m a design and bring the level of t|iialin^ up 
Ui tlie point where the firsl protr)i>*|M»s are iLseful to software 
parlners. However, deft^'Ls can Im? ftnuid in postsilicon vt^ri- 
nraiion that eludcHl (iresilicon venfieation for niany reasons. 

Fiii>t, te*^ ft Hie t-aii be nin dl \mri\\Mue siM-eris, >siHt'inl c>rders 
of inagninicle faster limit even the fastest ?iiniulalors. En'ors 
III the simulation models or limitations in the slnuilaiors 
themselves can (^ause the beha\ior of the silicon to differ 
from that predicted by the simulaloi-^- Finally, iitost sin\ula- 
rion is targeted at system components siuh tai* the R\S(KKJ 
itself, rather than the eniite syslem. Errors in the sj>ecifica- 
tions for interfaces hetw^een componeitts* or cliff ereiit inter- 
t>relaJions of these s])ecificiit torts, can heconie ajiparent wlien 
all components are integt^ttecl and the system exeicised. 

Overlapped Test Coverage 

PrevitJiis jnojcris Jiad established the value of mnnin^i test 
code from as many sources as i>ossibIe hlavh test {^ffoM liad 
its own focns and muijiie value, but each also bad its own 
blind spots. This is even true for pseudoraiKiom code gener- 
ators. In the design of these \-er>^ complex progranis. many 
decisions me made iliat itffect the rharaiitT iuui slyle of 
lite generated code. Thert* can lie c« wie tlefects as well that 
cause coverage holes. Tlie large overlat) in ctiverage be- 
tween efforts proved to be an imah table salety net against 
the lindtations antl blind sp<:its of infiividnal tools. 

Tlu* PA 8000 \'enfication temii focused its efTml on psi^iido- 
rancjom ciHle (esllng. Experience showed fhai ibis w<Hild 
\n' Hie primary soiirci> of siibtk' d(*(Vc|saJul would allow its 
to fmtl oiost liefects before o in software tjaiirieis. We tan 
several tools including our pseudoratidt)m c*<Hle generator 
iim] generators used in thc^ dt^Vf^loptnrnl of the PA 720f)aiul 
PA 7'M)[J' iirocessorsand flu^ liP iHKH) Model 12'y work- 
station. Sevei^al tools were capable of generating titie mulli- 
processitig lest cases that included data sbaritig betw-eeti 
rant lorn si'(iiic^nc(*s n inning on dilferent processors. Data 
sharing w^ith DMA processes was htiplemented as well 

We flev(^lo[)ed aronm»<m test en\iro[\men( for mi^sl otftiese 
ratidom ct^le genrr/ih>rs. The IIP-IX'' operating sysletii is 
not £i suitable environment hiH'atist* itsiiroteciifnt checks 
flo not iiiTtnh tuany of the processor resou ret *s lo lie easily 
mtjni|nilaied. Thrrtest enviromnent alio wed random testing 
of pri\itegcd o|3t^rations and also inclnded mutiy h^ritures lo 
improve repi*iit ability and facilitate dt^mggirig, Foj examph', 
it perfontied carefttl initialization before ea^-h test case so 
that, aidt^d l>y logic analy/.er traces, we Cfvtihl mt>v*^ a failing 
test cjise to I he KTLsinmkibxr for easy debugging (in binxl- 
warc, tht*re is jio access lo tiiU»ninl signals), Wc esiablished 
a ijlug-an<i-tjlay AJ*I so tliat tlie investment itt I lie environ- 
nient i'oiild be leveraged across several generators, 

III pHiallel with otir pseudonmdoni te.sting, our software 
piiniiers inttsui'd theb' own lestitrg efforts. While primarily 
laiLJieled at the it own sfillvvari\ I his provifk*d stress tot the 
prtxessor as well, Tlu* lest efforts inrludi»d the liP-l 'X jind 
MPR/XIj operating systen I kernels, 170 mid network drivers, 
f ninuumds, libraries, atul compilers, Peiformmice testing 
also |!r( aided coverage orbiniclun^iiks 'a\u\ ki*y j|ipliralirjns. 



Finally, altliough it did not Find atjy defeats in the PA BO^XJ. 
HF*s Early Access Program tiuule a\ailal)le preproduction 
mnis to customers and external apphcation iievelo|K*rs, 

Ongoing Improvement 

i\Tien defects were fomid, %ve iised the process show n hi 
Rg. ti lo learn as much as possible about why the deftxi w^as 
missed pre\ if>iisl> and how covemge could be impro\ ed to 
find addili^imil relatt^d defects. After the riKit <'ause of the 
defeel wiis dt»tiMiiiiiied, actions were taken m i]w iueas of 
design. presiJicon verification, anfl t'^^'^tsilicoii veriflcatton, 

Tlie designers w(uild identify w^orkaromifis for the defect 
mid commutiit ati^ these to our stjftware (partners, at the 
same tone seekitrg their injmt to e^ahiate die urgejM*y for 
a tape relettse io fix die defect. The design Ox was also 
deteniiiued. and inspections were done lo vabdaie ibe fix 
mul brainstorm for similar defects. 

In the area of presdicon verification, reasi:>ns why the defect 
was missed w<mld be assessefl. Ilils usually tumeil out to be 
a test case coverage problem or a bhml si>ot hi the checkittg 
sottw^ar(\ Motlels woiikl then he built with the design lis: iuui 
otJier correctioius. Test etjverage woukl be enhaiured in the 
area of the defect, atid sinmlations were done lo |>nt:i\'e the 
fix and searcli for ariy relate^l defects, t'ycle-based simula- 
tion played a big role here by liuiliug sevcM^al tntrotiucetl 
defects and incomplete fixes. 

Tlie postsilic f)n vciirication arti\ ities were similai: ( 'ovemge 
for the tool that found the tiefeet would be enhanceti either 
by focusing it with control files or by tool iiuprovements. 
SpurrecJ b\ ;i he^djliy rivnltv, engineers who owned other 
tools would IVequenlly imtnovi^ them [osbow Ihai they 
could hit the defect as w^ell. All uf this contributed to fnvthng 
related defects. 



Defeci Found 



OeiGrmtnfi Why 
Detect Was 




H 


Impf av» Test 
Covprafe 



Oeiermine 
Root Cause 



Improve 
Coverage 







1 


rdentify 
Workarounds 


'^ 


Seafch for 

Addilifin&t 

OeleGts 





Build 

Conepted 

fAadth 



SffSTcIt for 

AftdiliftnaJ 

D^ects 



Prestltcon 

Verification 




tn&pGCl for 
Retateit Defects 



Design 



Postsilicon 
Verification 



Fig* (u Musisiliruii i.|3i,:ilily inif)rnvt^inr*;i! |irnrr.*ss. 



Aitj^Msi liiKi? lifWk'ii-Pa£kaif|.;niimJ!] 29 



)Copr. 1949-1998 Hewlett-Packard Co. 



Performance Verification 



Results 



At the riiiiF rlie F^A 800(1 was iniroduced iii pruducis, it wlls 
thp woilds fasU'sl a\'aikil)le ink roprocrcsson Careful micTi> 
arrhitec'kiral oj)! imitation aiid vciiTitalion of tlie ck\sign 
against piTfoiinance spccififatifHis were* fartors in acFiieviiig 
this leadersliip perfomianre. 

In amicrnarrhilfMfiiral <h'sigri as complex as tiic PA 80(10, 
seemingly obsfvire (jenjiition decisions and deviations of (he 
ciesign t'nan tiie wpet iriraliori ran t aiise a significant loss of 
perfo nuance when ayst em- level ef feels ajul a variety of 
workloads are c^oiLsidered. A good example is a design 
defect that w^as foiuul and eonected in the PA 8000. Wlien 
a cache miss occuned under ceitiiin cirenmsiiuiees, a diriy 
cache ime being evicted fioiii the cache wcuild be written out 
on llie system bus belbre the read request foi' the missing 
line was issued. Since the addresses of t lie two lines have 
similirU" low-ta'der liits, hnlh majifUHl (n the same linid< of 
mmn memor>'. The niertio^' coni roller woukl bei^in process- 
ing the write as soon as it was visible on the bus, busying 
tire memory' hank ari^l delaying I he processing of the nK>re 
Clitic-ill read. 

A delaileci niicroarchitecturaJ perfonnance simniator was 
writ! en early in (ht^ projt^i to help guiud agamst such issues, 
h was used to i)rc>ject p erf omi mice and generate a statistical! 
profile for a variety of benchmarks and applications. Work- 
loads with siirinising results tu* anotnalous statistics were 
targeted for more <Iei ailed analysis, and through this process 
oppoiUmities were ideal ified !o improve die microarchitec- 
tnre, Paiticidarly valualile feedl >ark caiue from the compiler 
development team, who used the simulaior to evaluate the 
pcTfonnance of compiler prototypes. The <(jncurreTii dev<d- 
opment of tuned compiler with close coojjeraEion bi^EWi^eu 
haitlware and software teams was a key contributor' la the 
PA 8000 s peribmitmcc leadersliip. 

The microarcliitectural peiformance simvilalxjr was written at 
a somew^hat abstract leveJ, so it could not provkle feet 1 hack 
on whether the detailed design met theperftinnance sf^ecin- 
eal ions. (Comparing tlic performiurce of the RTL shnulator 
against ttie microarchitectnial simulator was the obvious 
way to address I his, but the liTL simulator was far too slow. 
As a compromist% w^e trerfonued this comparison on key 
performance kernels Mial were tractable enough for the RTL 
simulator. We also develoijetl a|>ath by whicii a workload 
could be nin up to a critierd point using the tnicroaichitec- 
tiiral simulator, at wiiic!i point the state tif the memory', 
caches, and processor registers could l.>e transferred into 
the RTL simulator for detailed simulation. 

PeiformatK e verification continued in the |>ostsihcon jjhase 
of the project. The PA 8000 incoiporated several ]>erfonr\airce 
counters that could be coi^figured to comit nmnerous events. 
These were used to help identity workloatb or segments of 
workloads ne*>ding closer analysis. The P.-^ 8000's external 
pins and deirug^vurt i>rovided sufficient inforntation to 
determine vvhen hist ructions were fetched, issued for execu- 
tiort, and ret ired* Isolation of specific performance issues 
was aided by a software tool called I he depiper which pre- 
sented a visual picinre of instnidiun execution, Tluxiugh 
these efforts, several peifonnance-related hardware defects 
were identified and corrected before pruchiclion. 



Achieving u defect-free design at first tape release is not a 
realistic expectation for a design as complex as the PA S(M)0. 
Nevertheless, we wer'e extrenieiy satisfied with the (juality 
we achieved at first tape releasc\ The first proloiyiJes were 
capable of booting the rjperatirig system !:inil riimiing vimuiliy 
any ajiphcatiuii coirr( lly. In fai t, only one rlefect was ever 
liit by an application, alt hough a few defects were encoun- 
tered in stress testing of system software. 

Fig. 7 shows the sources of defects found and coo'ectcd after 
tlrsi tape release. Sun>iisir>gly, abriut a third of the total de- 
lects vveie found by contimied use of the presilirnii verifica- 
tion tools (mosdy the cycle-based siiuulalion environment) 
for a few months following tape release. This irHlicates that 
(U^pite the outstanding iserionnance of eycle-based simula- 
tion, lite project would have bc^nefited from even more 
ihnjug}i]>ut. or perhaps use of the tfJoi (^arlier\ A I bird of the 
defects we re also found by one of tlu^ pst^irdorandom code 
generators running on hardw^are protcjtyries. Inspections 
were a significant soirrce of defetis. The remaining defects 
were split betw^een turn-on work, iierformance an^tlysis 
work, and i»aiiner software lesling. Since VTry few defecis 

wxTe discovered by partners, we could ger\erally conm i- 

cale w^orkarounds ahead of time and take other steps fo 
minimize the impacl. 

Fig. 8 shows the impact of t lie flefec ts found after firet tape 
reletise on oui' software partners. A hu^e majority were never 
seen outside the environmcMil in winch thc^y were found and 
had no significani impact. Al>out half of these hivolved func- 
tional ^ireaSt such us debugging hardwar'e, that are not even 
visible In a] >i)licuM( jus or system software. Most of the re- 
maining defe( Is had only a moderiite impacl, Hxamples are 
detects that were found l>y a partner at the expense of their 
testing resources, defecis that required a workaround in sys- 
tem soft W' are. and defects t hat required ceilain [>eifomraiice- 
related features in die processor to be disabled. Only a hand- 
ful of defects w^erc^ severe enough to temporarily block or 
significantly disrupt a pailner's dev^elopment and testing 
efforts. All luit otk^ of these were early multiprocessing de- 
fee t s d 1 ill s ] i gl 1 1 1 y ( I e lay e ( I b ri ngi n g ui j It i e mult ipr'o cessing 
operating system. 




Fig. 7* Sourcres nf dt^fecis fnund nint cnrn^Hrd uftcr first tape 
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Fig, 8. ParliicT inuwir!. 



Fig. 9. ivrp^t*i{«i*'aping trartitiminl pmsilic-on vcrifiratirHi, 



Cycle-Based Simulation Results 
The cyt-lc'-basetl .sinuilatkjn i^lTojl n^ade an esseiitiaJ con- 
tribuiion to the verifKadoii of thie PA 8IJ00. Fig. U shows the 
soiuces of (ieferts tiiat elucitni niir RTL simuhttion efforl 
(which incoqiorattHl existing b**st practicx^s). If we hatl not 
HUKte the investment in eycle-based sinmlation, the number 
of (Jefec'ts that would have iiad to be found by postsiheoji 
tec hnitiuc^s would have been three times higher. If was much 
Icsy expens ii'c to fix the tlefrcts eauglit by cycle-b^ised simu- 
lation as the design progressed than it w^oiild have been to 
fix them in later revisions- 

Also, because cycle-based suuulation tended to fintl Hie 
iuosl se\'ere defects early, no masking defects were jnesent, 
and the number of serious blocking defects that wv had tt) 
rn anage iili er 1 1 1 e In s I I ap e re 1 1 ^ase whs re d u { ' t^d I jy d n ( '<^ to 
six times. If our softvvarc ttailncts had been cxposeri in lliis 
level of severe defects, it is probable thai Hie prodiMi's linie 
\o nuirket woukl have been impacted. 

Finally, cycle-based simulation providetl a high-conlltkuiee 
regri'Hsion test before each tape rek^ase, Several incom- 
plete hng llxes luid mnv di'tVrts that had been introilnced in 
the design were luand in nine lu be corrected before a lape 
release. 

Cone Ills ions 

ConiimttiLiH imtovaiion in fiinctional viTincation tools mirl 
processes is rettnired to keej) pace vi itli tiie int^reasing micrf>- 
architectural complexity of today's ("Pi's, This i>aper has 
described ihe methodologies tised to verify ilie PA 8000. 



These met our ny^st importiuil goal of imt^roving the c|uality 
of the PA 8000 to the higli level demaniiec! by our ciisfoniprs. 
By finding defects early, they also helped us nnisene our 
engineering resources and quickly (kHiver Ihe liKiusti"y- 
leading perfomianc^ of the PA 8000. 
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Electrical Verification of the 
HP PA 8000 Processor 



Electrical verification applies techniques from both functional verification 
and reliability and environmental testing to improve the quality of the 
CPU. Electrical verification checks that the CPU functions correctly under 
stressful environmental conditions, well outside the normal operating 
environment. 



by John W. Bin- khans, Rohit Bhatia, C* Michael Ramsey, Joseph R. Butler, 
and David J, IJniig 



Earty m a protluct's design lift^ cydc, considerable attention 
is i>ai(i to its runctioiial ( onectness. This fiintlionaJ verifica- 
tion iscanicU out on llu' cariiesi pnitolypt^s and, espt^ciaily 
hi the casf^ ot cniiiplcx dc\ici?s such as large VLSI circuits, 
even earlier tinou^i simulation. 

As a product at>proai!hes cusloiner sliipivicnts. U\sf Jng il 
againsi I IP's stringent reliability ancJ environniental specifi- 
cations is a cri1:ical task. 

hi hetwt^en these two U^hI ineiliodolo^i^u^s, ll)t^te is a ihir<i 
mothcHl tliat is Inn'oiiiing inneasiiigly inrtJortaaL This is 
po.sfsilfroii f'UTfnciil t^rrificaifou. BlcrUical veritlcaiioii 
a|3i)iios ttH-hniquc:^ frojn holh luiictioual veriiu'atinn ajal 
reliability and en\ironniental testuig to iniprovt* I lie (|ii;dily 
of a device or product. 

The philosophy of elect i ical vrrillcatiiJii isdiriivrcnl frnjii Ihc 
other two methods. While it is possible, althotigh not Jieces- 
sarily comnum, lo <'oaiphMe !\iiieti«m;tl veritlt ;iIUjii and reli- 
abihty testing without tinding reasons to rluaige a ciesign. 
elect ricaJ verification's goal is to find a design's weaknesses 
arid fix I hem. Even a very gtJorl fiesign should cf>nie out i)r 
fcle<1dt al verification with a higher k^vel ot quality. 

Like iuiictioual verification, electrical verification seeks to 
exercise tLS niimy uf a designs logic slates, signal pciths, aiiti 
stilt e transitions as possible. However, enterin*4 a stale, tlriv- 
iug a signal, or triggering a Liaiisition once is not (*uough for 
electrical verification, Coinl)ining relialiility t(»stiug with the 
coverage ul' functional verification, elt^cirital veiificalion 
re]ieats I he functional tests under stressful eonrlititais 
beyond w hal a ijrofluct uuiy e\xT see in a real aptilication. 

Electrical verific*ation goes beyond the Uniits of relialjihty i:aid 
environniental testing, which is typically done only at tiie 
system knel. Reliahility tests aR^ usuiilly done independently 
of each other. For example. Une voltage varial ions are not 
ajjt^hed c on ci Ltrendy wit h ami ( i e n t t e m 1 1 e rat u re I t^st s. R eli - 
abihty tests stoii at predefined luniis. In contiust, electrical 
verification varies many test parameters at the same time. 
The nuiges (jf those parameters are <*ontinually increai^ed 
uiUil faihnes are found. 



Elec tiical verificauon is not simply a random scattering of 
tt^sts executed in the hoiiJcfnl search lor some kind <if statis- 
tical ctmfidenee. Instead, the goal is conii>leie coverage 
of the jjroduet's operating s|iace aiui beyonrl. 1'his seives 
tw^o pnlpost^s, First, the d(\sign is \ tin tied overall of tiie c r an- 
hinations of actual conditions it may encounter Secondly, 
failure nvechanisnts or iTitUiil feat ares diat may lie f a it side 
normal ot>erating liniil.s viu\ be found, ideiuHleii. and possi- 
bly fixed . These items, such as ciitiod timing paths, <haige 
sinning c*Hulitii>nsH ur niaj"ginal driver strengths, ctui move 
inside iu>rnial limits as a product ages, manufacturing 
conditions change, or otht^r nnanticipated situations arise. 
Removing them emly in tht^ design life cycle i*nsmcs a reli- 
able prt>duct with a longei' lite for the customer an<i avoids 
a costly in-t>RKlurtion change forHR Furthenuore^ fixing 
these elect riial failures car\ increase the yield of the device 
at a given frequency and tan enable higlier-frequeucy mid 
h igJ I e r-f )e 1 1 o ru i at a ' e u pgi at f es. 

An adtiitionaJ puri>ose of electrical veiifi<'ation is lo hclti de^- 
termuie production tesi limits mid guardbaruls. By including 
IV jjrocess i>aramel(^rs in tlie verification etTi>rt, tesi limits 
caji be exlrai)olaled lo predict t>rot>er function I h rough 
noniial opei^aiiig eondilions. 

Fi^. i is a fiows luirt fjf the ekntrieal verification process. 

Shnioo Plots 

The hmiS for elect deal hugs begins with the Hhuiao pinL The 
shnioo was a charaeter in die LiF Al>ner railorm strip that 
had a rlianging, hloblike shape. The shmon |>lot sIkjws 
wheiiier tlu^ device under test passed or failed as a functitjn 
of various coml>mations of electrical parametei's ai>phe<J to 
it. The nmue has become tiail of the enginecT's vernacular 
because the region where a particnlar dmiet^ jiasses (jr fails, 
iiloited against the parameters aijplied may have some of 
the rounded cunes and shifiing nature of the mythical 
shmoo. Oftert. the shape of the plot conveys hiformation 
about the riiilures. Many eommon shapes liave been given 
nmnes (see page 3^3), 
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Fig* L Till' s'iiririciil vcriilt'iHiou imnt'sy. 

The *5h moo's name has also become* a veiix To silimoo b to 
run a h-.sf rojieart^diy whilr varylii^ one* or more jimiimcfer's. 
Tliosc ]uinmu^ifi>i aiv Hu* ;o(t*s ol iJu' nhnioo plcji jukI iiuludo 
bciih thi' <ifmoiisam) \hv luHHilnious. 

Obvious shmoi) plot pujaitukTS iiicludc | lovvt^r supply voltagt^ 
and tem|)crutim'- hi <uinplex sysR^ms smU im lhos(» u^Lng 
the PA 8(KM), many (iifrert^nl suppiy volia^t^s exist. Supply 
voitagt^Harttl tt^mprratuR' are f It^ar dioircs for .shmt>o 
ploUinj^ biM'aUHo lliey so dhetily alfiHl ilic^ oiieraJiun of 
(>UHlroivir devices- 

The H' iitanEilVK'turin^ process is a key shnirjo plot jjarameler 
for projerls like ibe PA Hi)iHl In ke(*])iiig will* liie ^oid of 
met liotli rally ('overiog the shnnw .spare, lesting a large, 
ramlom sample of parts isn't enough, histead, prototype 
parts ail' in;niiirarliir*'il witli c)ue or mon^ |iroeess metrics 
inleiititjtially morliHed, Typical IC shmoo jilol ])aranielers 
are t J ansisLor gati» lengUi and leakage currents, 

(lock frequeiK'ies are also good shmouplot l)arameler^l. 
Increasing t he Ireiiuency of clocks is a gtHKi way to find 
slow signal [laths. Htmever, pusfiing frequencies higher only 
lells some of the devU-es story. Iseful Informal ion t an also 
be found by seeijig hcnv slowly it caji go and by testing many 
frequencies in between the maximum and mijiiinum. htjgic 
races and transmission line reHections JU'e Just two potential 
prcjbleniH llial may Imk in the low fret|ueneics. wlurli I be 
engineer often assumes me "^easy," 



A shmoo plot jjanmieier that may lje less ub\ious is tju^ stift- 
wan? exmiteci on the cle\ice under lest, C'hkhI shmotJ plot 
code s€*eks to exerris** as much (jf the <ie\1ce as j>ossib]e. 
Executing? a poweMip self-iest or iKKJtmg ;m ojieniting sys- 
tem njay seem crunpli<*atc^l, but they are not nect*ssarily 
gocKl shmm> tests, Tlie PA St)00 shnioo process used a large 
number tjf tests that ranged from speriatly designed to 
niii f bin \y generat ed . 

Test Cases 

The suecess of any verificalkui effoil depends to a large 
extent on the nature and type oft est t-ases dial are nin cai 
tlie CTl". Tiie testing code neecls lo be goo*! enougli so tluit 
when the systems reach nisiomers the CPr is bug- free. To 
ensure this, the tests musi provide adetiuate coverage of the 
design Teatures ijicon>o rated into The (."Pl\ F'unhennore, 
because of the complexity of lo<iay's processors, it is huiTCJS' 
sible to imagine all (if the inlercUMions rliat must tx'cur to 
cause a particular event in tlie processor. Tins complexity 
necessitates Uie use of nuKiom testing, hi the postsilicon 
envir<mnuMVL randoni testing is aided by the fact that 
tiuoughput Ls generally not a problem (c-omj^ared with the 
presilicon simulations done on software models, wliicti m'e 
niiliions of times slower than nmning on the actuaJ hard- 
ware) and therefore a huge volume of mndoiii testing can 
be accomplislied. 

Tlie ele<-(rical veriHcation of the HP PA SOCK) CPU relied 
1 1 J 1 Ol 1 t ti e !"o i 1 o w i I ig so 1 1 n -es f o r test cases: 

* Direcied hmidwrkten tests 

► Focused ranrlom tests tai^eted at specific C PI' fmulions 

^ Random code* generators 

' Liliraiy of worst -ciise tests for pre\iotis bugs 

* IiP-nX'''app!icatit)n code. 

In most instanees, ibese lest cases were checking for fail- 
ujes in teal tinti'. In general they wtaild set up stmie initial 
<'f)ndi(jfais, nm the test cotJ(% and perfonn some checking. 
Some cases du^cked for a specdlc laitionu^ in nmmor>' or a 
general rcgi.sier. (Jtherscompm-ed the hill anhiteeiural slate 
at the end of t tie test case to some expected final state* 

For most rif (he snurt*(^s men! lotted above, the lest cases 
were leveragetl from the presilieon viTificatit>n effort 
through the use (jf various scripts to perform modifications 
for tin* po.st.silicoji operalingenvinatment. Several benefits 
ciui be rt:ali/ed by leveraging llie work from presilicon %'eri- 
fication. FijTst. hwcragingall the fools n-sults in less (h^veloii- 
itient limeatid hence less toi^il work for the postsilicon veri- 
fication team. Second^ l>y sliming the tests anci ttKils, we get 
a common env iromTieni between presilicrm and f»ostsilicim 
verirK^ation. This allows i^asicr nuxiehng of posfsihcon fail- 
ures in presilicon siimilation tools antl makes the learning 
curve easier as welf It also jjrovides a path ti^ go l>ack and 
forth between the two environments, which allows direcied 
test development for postsilicon failures. 

Since jjresilicon verihcalion is largeiefl iii funcdniial cor- 
rei'tncss, tht* use of lesis It^vt^agcd from dial effort requires 
some caution. Electrical verification tests in general iiec*d to 
be much more data-patt em-sensitive than their futiclhmiil 
counlen>ans. Frxr exaniph% a bus may iKH'd to l)e <lnvcn 
from all /ejos to all ones or allcnuitiMg ones and jcenis to 
ndt^rjuately test for elect heal failures, w lie teas the data pal- 
terns in a fiaidjonal test don l usiialK intluence the logical! 
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correctness ctf the design. Thejefore, the random rode geiier- 
ators used in the electrical verification effort were jnodiHecl 
to provide kii(jl>s (liai allowed fhe test case writers to control 
the data patlerny Liyeil in these tests, 

(Jar lil)rary of worst-case test code for all hugs kitfjwri to 
dale was valuahle in ensuring rhat no known failure merlia- 
iiisnis had gone uncovered as we went thixnigh ililTerent 
revisions of the chip. This library' of tests w^as always kept 
updated and used repeatedly on all CPU jiarls. 

One fnial source of test cases was our HP-TX stress test 
suite. Not only is tins most similar to what ihe lujiiitnity of 
custoniei's are going to riiJi, bin HPd -X testing caji also offer 
the most coverage. I ^sually tiiis is used as the last set of rest- 
ing to make sure that all of our test coverage so fai' has t>een 
close enough to the stress that HP-UX code puts the PA 8000 
tlirougli. The stress test suite consists of a nnint>ei of applirn- 
tions inchaling the SPEC r>enchmarks iuul srnpts that make 
sure that evfrylhing is nnuiuig conci lly The drawliacks 
are that the nm tinu* for HP-UX stress tests is a few hours, 
which is orders of magnitude longer tiian tite otfier lest 
types, and lluil these tests are much harder to debug if 
failures occur. 

Automated Tools 

To s;ive time, we iTealcfl a number of trvols that help ns autfj- 
mate maay of the verincation tasks. Eat h of our test systems 
was connected to a controller system (see Hg. 2). We then 
replaced tiie htrof FK>Ms on the test system with ROM emu- 
I ators. O n r t : < m f to 1 1 e i' sy s 1 e m c o iTt ro lied the vo 1 1 ages , f r e- 
quencies. smd tem|>erature and the code that was inn on t!ie 
test system, ajid it tilso monitortni all of the UO to mid from 
the test system through the RS-232 port. 

We needed to have complete control over the code that w^as 
run on the test system. Insteatl of using the boot UOM to 
Lnitialize the sysiujn sf> ihal w^e could ntn a full ope ruling 
system, we used our own Iraiuework code tailed the 
CfmrROM short Unrhavm'terizatkm ROM [si^e Fig. 3). Tlie 
(^barR( )M insialleti a subset of the normal miti^ilization code 
and then v<\x\ test code for ns. All of our lests wert^ compiletl 
into a separate ROM called a tt^stROM whit li t'onld be up- 
loaded without changing the CharROM. Eatli test ROM con- 
taiitefl the test code as w'ell as infonuation on liow tf> nm 
each test, such as veiiiosily settuigs and niuriber of iteratioas. 
It also contained infonuatitiu about thi^ test that was fjrinted 
out so that our tools could save information on what lests 
we were running. With our systems set up this way, w'e could 



turn on power and withui a few^ sectaids tht^ test system 
would l>e printing out initialization information and tlten test 
codes. 

The ChaiROM hafl die flexibility to nm tests in balch mode, 
one ar a t i m e ac vo r< 1 i i tg t c j 1 1 1 e t est R( J M , o i i n 1 1^ rn vii ve iy f o r 
debugging. In Interactive mode the (liarROM let us nm tests 
and modify and view die state of the chip. 

The big ath'aiuage of the CliarROM was diat it htwit.ed quickly 
and let us change tests rapidly. This save<l a great deal of 
tinie tluring debugging when we needed to nm many cotle 
exj>erhnents. It alsti elimiiiated a lot of the bof>t code I bat 
would l>e needed to run something like tlie HP-UX rspeiating 
system. This meant that we were iess likely to hit a failure 
while boo ling anti if w^e did we could often make quick 
changers to Ihe CluuROM to avoid the failure. This was most 
obvious when w-c discnvered mi eleclrical bug in a branch 
instrnt*tion that was keeping us from l>ot)ting. We were able 
to rewrite t lie UhnrROM biimework in twf> tlays without 
using the failing t^^ijes of branches, sf>metlitng tliat we could 
never have dont* with a full-fletlged l>ooi ROM. 

Tlie most ijupoilant fcatvne of the UharROM was t!ie control 
it gave us over exat tly w liat code was beiitg ruii on Lhe 
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Fig. 3, Thf^ ('ImrROM nwke^si it i^oKsible tn Ikji)! qnirkL\' aJid rfiiiri^t* 
tpsls rapitily. Bnijt stiutK at the hegiiiiiiiig of the CharROM. 3rcti_E)tec 
tind die leslROM are copied lt> nieniurj' frirspfcd reasoii^. Basic 
;isseinbly tests Jtre mn tnther in RDM or in nTcnior^-. Mottilied phase 1 
fenaat tents art^ impat ktnl into avxiilabie memory' by artft.eKee. \^tiich 
runs and clu^r ksthcin 
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Shinoo Plot Shapes 



A shmoo plot is a graph that represents hcrw a panicular test passes or 
fails when parameters like frequency, voltage, or tempefature are varied 
and the test is executed repeatedly The sfiape of the failing regton is 
meaningful and helps m etetefmining the cause of the failure. Stimoo 
plots tvpically fall into familiar categories with descriptive names. 



A shmoo plot of normal cifCuJt operation shows better high-frequency 
performance as suj^ly voltage iFK:reases, as shown m Rg la However, 
otte^ shapes frKtuently s^n include the curiback (Fig, lb), ceiling 
(Rg. 1c). floor (Rg. Id), wall |Rg. lei finger (Fig If), and breaking wave 
IRg ig) 
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system. Running tests in an environment like HP-UX leax^es 
the tester at the mercy of the operating system ^ wluch can 
switch processes in and out and limit access to privileged 
operations. 

One of the important responsibilities of the CharROM was 
to initialize tlie ciiip state between tests. 'Hns helped control 
repeatability for a given test (so we weren't dependent on 
a random chip state) and jilso helped avoid dependencies 
between tests, that is, a previous test affecting the run of a 
fiiture test. 

Inside the CharROM we had a siibfraniework called arch_exec 
that could run our modliled presilicon test cases, arch^exec 
takes apart tlie initial state and sets up the chip accordingly. 
After the test is run, arch^sxec compares the chip state to the 
expected final state information in the testt autontatically 
showing us any failures. This let us deal with many test.s in 
bulk. 

To run our shmoo tests we had scripts that would boot the 
systems at each poirn in the shmoo test domain and read 
the output to decide it llu^ tests had passed or failed. After 
mnning a shmoo test m tun during a shmoo test we could 
analyze the output tn i^nurc lertain failures or focus on 
.specific failures. 

At the completion of a shmoo test, die shmoo script stored 
all 1 he output in our shmoo database. We used the datat^ase 
to hiok for specific tests or specific pjuts so we could avoid 
duplicating work. This also turned out to be very useful 



when we neefled to take another look at past bugs; we still 
had all of the shrnoo mformation for the bug work. 

Failure Identifieadoii 

The process of electrical verification of a CPU begins with 
the task of identifying electrical failures. An electrical failure 
cati be described as the malfunction of a chip under certain 
but not all operating environments. If the failure occiu^s 
regardless of the operating etiviromnent. then it is tenned a 
funrlional failure aJid is not covered here. Typically, electri- 
cal failures can be traced to some electrical phenomenon 
tliat occurs only under certaui operating envlronn^ents. 
Some examples utclude latch setup or hold time violations, 
noise issues, charge sharing, leakage issues, and cross talk. 

The task of identifying new failure mechanisms involves a 
nuniljer of steps. First, lest code must be selected arul mn hi 
a variety of operating enuronntents, Tlte data collected from 
this is displayed graphically in shmoo plots and anomalies 
are noted. Next, tlte anomalies are checked for repeatabiUty. 
If the auonialy repeats reliably, the failure signature is ana- 
lyzed m an attempt to classify the failure or narrow it down 
to a particular area of the CPU, if possit>lc. The sensitivities 
to different ojjerating environment variables are also deter- 
mined to gain further understimding of the fiiilure mode 
before it is moved into the (k^l)ugging stage. Each of these 
steps will [>e discussed in juare del ail. 
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S**arching for an Anomaly 

Ek't irjt al veiilkation uftlio PA KOOO CPl' tjrif ojiijiasses a 
huge lest space that is impossibk* la ravcT in a reafioiiable 
amount oriim(\ Tliis is parllv In-faTisr nniu^ inrroasiii^ 
cuniijk'XJiy oftii'vici^s antl jiiiiily hi't aiist- uf ilie lar^<' num- 
ber of operatiagfiivtitjjuiH'Ui vanai)Jes, (Jpt^ratin^ variables 
iacladt^ Icsi vodv. anil)feiit remiJerafiiri\ ^Jt^ve^al .siiiiply voli- 
ages. Irecjiuniry aiui bus ration, iy|>e.s of CPU vhips {li\, vari- 
afions iti ilir faitriratinn proct^ssj, flirrtMt'ni sjHH^fl grades of 
caelYO SlL'\Ms. ajul iiiatiy otlieis. A iiu[al>or uneciijiitiurs 
vvereapfjiiod in the eipclntaJ verification of tlie ]*A WiK) 
CFV thai, lakon tcjgotiu*r, elTef lively covt^icMi I his large lest 
iipac'(\ 

hiitially, I lie emphasis is |)la(HH| nn vai>'ing a large Jiunihci' of 
the operaUiig vaiiahles anci exercising tlie (Tt" witli simple 
test t;o()e* A variety otrPl' pails from cJitTerc^nl comers of 
I he fahrii^aliotJ process are deliberately selected and run 
under vaiious cmnhinations of t(*mperatnre and snpply volt- 
ages to look for I'ai lines. For example, known fast PA 801 K I 
(/Pl% were nm in a rold chaniberaf liigli snpply voltages to 
look lor one ckLSs of failures. Shnilmly^ a set ofslrnv C'lH s 
weie nm in a hot chamber at low supply voltages to look ft)r 
tuiother class of faikirt^s. Experience Is ahv^iys a good guide 
to the operating vajiable corntanalionjs that are likely l<j yiekl 
fai hires. 

Ktjresa testing is jmother tecfmique that is apiilied to inthice 
faikn'os. Stress testing refei's to running the CPU wiEh tt^st 
code on the fringes of the operating en\1ronnu^nt>i. uiifltM 
conditions to whii h an actual! system in ihe Held naay never 
he sul)jecl(*d. novvever. a faikne ijidmiHi in this I'asliioii can 
often be mtned into an operating region th^U we care about, 
simiily by furdiei' cxperunentatitjn mid mialysis. 

M the process of electrical verification proceeds, the t^n- 
phasis shifts from riuming simple test code at a vaiiety of 
operating t***iJil^ ^f* mow complex code sequeiKes at fewer 
operating iJoinls. This ran hi* cnjiipare<l lo exploring ihe lest 
spac^e from a hivadth-fuKi semch to a drpfh-first search, 
"lite more complex cotle sequc^nc es are derived from rini- 
ning several ranrkim code s<M|uertce generators, psendormi- 
doni focused Testes, dire< tefl tests, and HP-UX application 
code. 

Using oja^ or nion* of t tie t(H'link[ues outlined above, test 
data is giilhered ancf ean tie viewed in shmoo plots, Tliese 
plots are exiunitied jiiid compared to what luis been ot> 
sensed in the past for inevioiis test runs on earlier silic*on 
reviskms. If the shin txj [ilots reveal legions of failure ni>t 
dbseiired Ijehae, then we Unw a shttifto (Uitttafdtjj whicli 
needs to be p u rs 1 1 e d h i ii h ( n . 

Verify! ng Re pea ta bii i ty 

Once a shmoo anomaly is i^leTttiOed, a number of steps nt'cd 
to be taken to validate mid c<jiifniii it. It is impoitmit tiiat the 
anomaly be rehab ly rep eatable and be traceable to a CPU 
malfunction. Tlie steps outhned below are used to satisfy 
t he 1 ei>e;itatiiht3 rt^t juirt^m t^n t . 

1. The faihng code sequence is renin several times on the 
sanie UPl' to confirm faihne. This is d*me to lule out the 
possibility iliai an inadveiient change in the operating 
en\ironnient may have induced die failure. 



2. hi a sysit*ni verification environment, several other com- 
ponents ninsl bv removed from suspicion before the anonia- 
loos behavior can be attributed to the CPU* The failing VFV 
can be plaeet! in i\ crmipk^tely diffi^rent syslem and the failing 
code siHinenre renui under similm rjpeiating rondili(*ns to 
K^t'^'^'l Ibe Jaihnt\ Tlie failing <'PU riu\ also be used in several 
tliffercnt processor bnanls t(j rule out any deptHklencies tju 
the jnocessor board charact eristics. 

3. Tlie next step is to try to locate tlie failure mode on differ- 
ent but similarly fabricated VVV [larls. Thest^ couhl Vk' i>ar1s 
frmn Ihe same wafer or with siniihir inocess eharatteiisiics. 
Ifihi^ lailure mode is not ohsej^'etl on any other UPU. then it 
is generally coasidtTed to be* a test escape from llu^ wafer 
and package scTeens, meaning there is a defect on dtis chip 
that the w^afer and package* screens did not Intd, In other 
wvuds, we have not found an inhering probletu witli any 
circuits on the chip, h) that case, wt will iiwestigaie whether 
we liave a coverage* liole in otir w^afer or package screens, 
rather tlttm iiujve this failure into the (k^lmgging phcise. 

i. If the above thnn^ steps are satisfied, then tlie tailurc mmlv 
i^ checked for seMsiti\ilies lo ditfererH < liberating enviromnenl 
variables. Most etet Ideal faihiiesaiv nuxlulated fiy one or 
inoie of the oijeraling variables, whether it lie temtierature, 
su|)ply voltage, or ffelays on key system clocks. This can not 
only expmid tlie failure region, bin also (a^ovide some cha^s 
to the ty|)e of failure, wliicli can ho extremely useful infor- 
mation for the tiLsk of dei>ugging. 

Ti. Throughout the proi*ess of verifying the reia^antbihty of 
the faihiie motle, h is also important lo watch the* failure 
signatiue and cheek tt for cotisistency. That is. one must 
eosiire tliai eat^i leriai of llie test code is piCKhuungthe 
same failnrt^ mode 

C.lassify^iiig the Failure Mode 

After a .shmoo anojualy is identint^<l and has passed the 
repeatahiiity reqoirenu^n. it is tiuie to classify and list the 
t^hmafleristk's of the failure tiiode lo see if ii is unique or is 
one that has been obsetved before. Either classification is 
important. If it is new^ then it needs to be debugged fully and 
its roof cause detennineil. On the other hand, if il is a repeat 
failure nHKk\ then ihis failing t ud*- sequence neciis to lie 
contpariHl with tiie current knowii worst case for thai failure 
moik^ imd midei:sto<id as well. 

Failing code sequences can come fion^ a variety of souices. 
Tyjiically. eat h faihng sequence will ha%'e a eeilain failure 
signature and nnu h cmi be k^anied frnm it. HiTC, we will 
discuss three tyjjes oi failun'^s. 

The lli'st type of failure comes from self-checking code. 
A iyi>ical c^xanqile might be code written to i^xercise and 
walk knowii patterns through tlu* cache' SRAMs. Such code 
will clieck lis rcsidts and the faikire messages will be selU 
explanatory. 

Tlie set ond t v^je of fmkirt* is a final-stAte enor. generidly [wo- 
diKeti try nuukmi code generate ns. Random cocie generators 
prcnkice tests that consist of uiitial CPU state, a sequence 
of iLssemhly instructions, mid an exjjected final state. When 
a test termiiuUes, the final state is cliecked againsi the ex- 
pected final state and discrepancies are noted. By auEilyzing 
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the rinal-slnte em>r messages mni li Hiking al Ihe lesl code 
sequenee, oiie can infer quite a btl alwrnt the nature «f the 
failure and ctmie up witii a ;^l of exjM*rinieiit.s ta furtlier 
zero in on the faihire, 

Ttu^ Ihinl t\"|H^ nf faihire is one in the framework ccmIc. 
Fraxnewurk code is r<Hie written to allow tiwils such tis ran- 
dom cotte generators ro nu* on the hardware. The frame- 
work [irovidt^ the en\1ronineni for iniiiali/ing memor\', 
caches, and tl*e architected slate of the CPl\ Sometime?*, 
esp(H'ially early in the project, failures will ck rnr in ninning 
tlie fi-amework code, hi ^[eneriil, ihe.se m'e haj*d lu debug 
since Ihe failing code seiiuence (ihe fmiiiework) could be 
diousands or millions of insinictltins hmg^ 

The chiu^acteristics of the failure mocJc are del<^mhiled by 
noting the sensilixities to diflerent operating ernironment 
vaiiabics from I be repeatability exi>erimenls above or 
tiu'cHigb aflditioiial experimenlsat t bis .stage, h is inipcjHant 
to do some anujunt of debugging juid failure cluuiieieristic 
detenninarion to rule out most knowji failure uu>des to date- 
To Jsumniarize, the task of bug identification Is complete 
when w^e have accomplished ad of tbe above and liave made 
a reasonal.^le cTfoit to nile ou( known fm>bliMns, We now 
have a new^ Img thai is ready to l>e taken tluough I lie next 
disk, <iebugging. 

Debugging 

The ^oal of the debugging effoit is to (k'l ermine tbe root 
cause of Ihe failure a [id fix it on riie chif) in u nvw revision. 
Tlie Juain slepjs b> acbieve Ibis goal are galberin^ daia about 
the failure, esi^imdiug tbe failure regicjUt and liypoihesii^ing 
the cause of tbe failiue. Tliese steps are all iiiu^l of an ibn'a- 
live process I fin I t jui lead us lo on r goal of com | dele uuder- 
stfmding. As more data is galluTed about the tailuie. a more 
t omplete ^uid accurate hypot bests can be formed and a 
more accurate worst -case vcxi or crui l>e detenuined. ih\ \\w 
r>ther hand, new iufonnation may also prove that our iuiiial 
bypollu'sis was it icon eel. lu llial rase, we go l>ack to the 
data gaibering .step lo acciuiiv more iufonuali<JU aijout ttie 
failtjre. When all of our data Is consLstent witli ovu' by] )ot he- 
sis, we liave tbc^ root cause of the failiiie. 

Gathering Data 

Once we have delerminefl troin itie bug ideiUincatitJii jjro- 
cess ttiat tbe faihire is one tliLit we have not seen bc^fore, we 
need to giitber as umr h dnta as jMissible about ibis new fail- 
iu"e. If mullipie etiips fail in Ihe same way. we may be ablt- ^r^ 
correlate die failure with a si>e<*itu: wafer or lot or to a si>e- 
cific speed gra<Jc^ of tbe cliip. For examplCn it is possible thai 
oiiiy chips vnth extremely sh>w FETs will IViiL t'luM^king 
wbieh revisions of tbe chip fail can tell us w bet her tin* fail 
are is relatetl to a ret*ent cbauge in tlie chip, ta wbethi^t it 
has always been tbere. 

Tile next step is to sborten the tnstntction sequence Oial will 
cause the failure. Often, failures occnir in sei]ueiices (jf over 
too insinjciious, biU the tailnre itself usually rei|uiies only 
a few specific data tuntems aiifi sonu* sjieri lie iiLstructirm 
timing. Determitiiug when* iu the inslrncliori stMiuence I be 
failure isj ot^eurring is one step toward isolating the faihire. 
Occasionnlly, tbe faihiig instmclimt isi^asy to fiiuL For ex- 
ample, ifc^uly one iivstniclioii in I be Ctise modi tied I lie tailing 



register and the inpuis ro that one insfrudion (M not chjuvge 
tliiring the cas*\ the failure has l>een isolaied io the instruc- 
tion siHjuence aroiuid that insmiction, I suaily, it is more 
difficult. If the failure causes an ui\reco\ enable trap or reads 
bad data fnam t!ie cache or main memory, we neecl more 
information tR^fore decifMng when* Ihe fiulnre (Kcurrefl 
For example, if a ioad from cache reads the wrotig datii. is 
it beeau,^^ anodier instnii lion storetl bat! data, or were Ihe 
addrt^ss linens int on'e<i during the read, or did the CPU 
corrupt die data afler it was readV 

Failur4*s in tbe framework of a i-andom cwle general or are 

quiie <liffiiitlt to debug, especially if tlu^ framework is writtejfi 
in a liigli-Ievel language sucli as C++, If the failure vim he 
locaiized lo a specific code sequence, which <^t juld lie quite 
long, u can usually be ported to a siandalone case (no 
framework involved). From lliere. the same ste|}s are taken 
as with imy fiiihire sequence to sbotlen the test case while 
mahitaining tlie same failure mode. 

Mtmitoring external events may be useful Logic analyjsers 
attached to external tjins. such as tbe system bus interface 
and the cache interface, provide a picture of what itie CPC 
is doing when the faihire ocinu:s and cmi help narrow tlowii 
w^hen* in the c(Mie il is failing. We may see instniction tetches 
from main m€^mor>, wbicb CcUi tell us what area of the t^ode 
tbe CFV is executing. Since tbe logic mialyzer store.«i many, 
numy previous stales, we t an look back through the execu- 
tion of (he (*ase lo see when bad data siaits to ajjpeai. If the 
failure relates to an off-clhp path, oscilloscopes can be used 
to verify the signal integrity of suspect paths. We liave useil 
tills metiu>d when debugging noise-related problems ami 
failures caused by imprecise imperiEmce mat chtng. 

We would like (o narrow down tlu' code sequence to a very 
shoit sequence of insdnctujns that will still fail in the same 
way as tbe original case*. Creating a very short case such as 
(Ills is not ejisy, especially t^onsideiiiig that tbe PA 8fK)() uses 
fHil-of-order execuiioiL Changing tin* orighial test case in 
any way may change die timing rjf eeilaiu events in tbe case 
sucb tliat. it. may tiot fail anymore. In gatbering intonnalion 
about the taihtre, it Is important to determine what events 
cotilribiite directly to the failure and in what sequence the 
events must occnir for die laihue to occur. Just renio%it>g 
instructions staiting at the t>eginning of the case may ntil 
helt>. Su|)t>ose that a load instntetion* wbieh rumnally wtnild 
have (*ausefl a cache miss and n requtsst to main memory, is 
removert bfim tli(^ lieginning of (he case. Tbe behavior of the 
case will t hange tH*cause Ibt* next memory o])eratiori that 
accesses that cache line will cause the cache miss instead. 
Tliis change in timing may cause two events in the (T*r that 
were concurrent in the original case to be set>ara(ed by many 
states iu (he modified case, f^enioving instna lions may also 
have the effect of changing register data patterns (hat may 
have been required foi the failure. If an add instruction that, 
sets }\]\ a nxt)Ff)FOR)Fda(a |ial(ern in a tegisler is renioveil, 
(hat register will ccmiain its initial value instead -^(liflenMit 
from the pattern set n|k liy (be j^idd — and (fie fiiilnic may not 
occur. 

Fxperitnenls wi(h tlie faiiuig code are still very important, 
Removhtg ij relevant instructions anti data can nanx>w the 
search for Ibe fallme. Il is (lossibk' tlntt ii large number of 
instnictions in llie biilin^ .st^quence can hv removed witluntt 
affect hig the failure. Tlie dalii pattenKS in thesimire registers 
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for one or more inslntctions migbt b^ ohangccl without 
afferting tlie failure. Each of (iiese ctianges narrows the* 
search for the failure meciianiam. Very slight t'h;ingts in tJie 
code sequence or data patterns can provith' infoniiation on 
what events aie nt*ressai>' tV>r the failuie. If we i-hange only 
one bit in one <liLta p^ittent aiitl the faihire gt^^s away, that is 
a big indication that the failure requires one bit to be set a 
certain way. Another useful step to narrow our search Is to 
determine if any s]>eeific CPU features are involved in the 
failure. For exanitjle, if wv c iui luni off a specific t.'Pl ' fea- 
ture, such as byi>assing a register vr\u^ fron\ a |)iiJtdirie 
stage, aiifl the case now passes, we might say the failure is 
occiuTing in die by|>ass logic, or at least the timing of the 
case requires a by]3ass. 

VVliile we are doing the code experiments, we may tise some 
of the on<*hip test and debugging circuit ly if> gel a lietter 
picture of i lie failure. By running the ciiii) in b<jth a passing 
region and a failing region and cojni>£uing tiie two runs, we 
can get a picture of where the failure starts. 

Usii^ all of the data gathered, we caii begin to see the over- 
all |)iclure of the failure, We kntjw under what conrlilions the 
lajlure will (jctun incUuling trequeucy, voltage, teniperaturt\ 
and process |>ar'aineters. We know a shorl code sequence 
that will fail, iuifi what Cl*lj features afHJ liniiiig affect the 
failure. We have a partial picture of the intenuil state of the 
failure by comparing passing and failing runs. 

At the Siiiiie tinve thai we are gatherijig tlata afjout the f;iiimt\ 
we are develo[jhig a new test case for this failure. This new 
case will use the known requirements for the failure to 
occur, including specific instruction sequences aiul timing, 
specific register values, and cache hits cind misses. When 
t)ur new^ case faiis, we have all the elements of the failure. 

Expanding the Failure Region 

in most instances, the i.uulicular failure that occurs In a ran- 
dom instnit tioii sequence with random data patterns is not 
the worst-case failure. We won Id like to know how severe 
the problem is. One of our goals is to iind tlte worst -case 
vector. Failures that were previously outside the operating 
region usually move into or very close to tlie operating re- 
gion wit h a worse vec'tor. For example, a speed hiilure may 
occtu^ at a significantly lower frequency when a worst-c^isc 
data pattern is used. Or maybe a failure that only occurred 
at 40^C' will now occur at 20''C> Expanding the failure into 
more general shnioo conditions also heljjs in gathering more 
data. (It s not much fun lo probe signals in a 40 "C oven.) 

Electrical problems are often heavily influenced by data 
patterns. For example, driving diEfeiTnt data tiattems on art 
fntemal bus may incretise or decrease tlie capacitive coupUng 
or delay of a signal that is caiismg the failure. It is urUikely 
t!iat the raiitiont data pattern use<l in the original failing case 
is thi^ absohite worst ftjr this pailicular failure. Complicating 
the matter, it may not be clear what, other signals could be 
affecting tire failing signal. 

Tliere are souie cases in which the failhrg signal may not be 
apprecial>ly affected by any otlrer sigu^ds. in other cases, the 
faihng signal, whicir nright lie part of a bus, nray be influenced 
by cen;dn data [>atierns on drat bus. For instance, sonre of 
the signals that make tip the bus may capacilively couple to 



die failing signal in that same bus, slowing do\\ni the failing 
signal or inverting its value. Occasionallyr the biggest influ- 
ence on the failing sigiuil is a bus that is functionally luire- 
lated to the failing signal, but is in c lose proximity physically. 
The same tyj>e of capacitive coupling carr ticcui" irr i Ins case. 

Changing obvious data patterns — instructions themselves, 
operands fronr registers, and ihila rest i Its Ircuu the ALl- or 
the cache^ — is the frr'st step. It none ol these seem Icj affect 
tlie faiiurCj the layout can be consulted to see what buses or 
signals run actjaceirt to or on top of the \ictinr signal. Finding 
one or more data patterns that iiinuence the faUure also 
iillows a betier underslanding of the iailure. 

Hypothesizing the Cause 

In hypo tlie si zing the caiLse of the failure, all of the rnfonua- 
lion flrat has been acquired ai>out the failure will l>e used. 

Tlie minimum code sequence is especially useful to the cir- 
cuit debugger. First, it provides a list of cotie sensitivities 
that either turn the failure on or off or expand the failing 
region. Second, this code can be simulated by a switc h-level 
simulatf^r to give the debugger full observability into the 
un-clrij) t ire nils l>eirrg exercised by the test c^ise. By conrpar- 
uig simulatiorrs with and without the code serrsitivities, tire 
exact effect of the code on the circuits is observed. 

The swil til-level simulator is the first tool used by the circuit 
debugger. The exercised circuitry is compared logether 
with lire internal state differences fiour the passirrg arrd fail- 
ing data (^apuu'es to narrow down tlie cirerrits rrrvolved. Tliis 
process yields several irotenfial c^rde experinienls that will 
continue to rr^irrow the playing field. At this stage in the de- 
birgging cyci(\ lire circirit debugger is w^orking side-by-sidc 
widi the systeru rle bugger to isolate the failure. 

Eventually, the circuit fiel>ugger makes a root'Cause hypoth- 
esLs as lo tire cause of the fjulitrg beliavior This hy^wthesis 
can frequently be supported by the switch-level simulator. 
For exanrple. if the Irypothesis is that a certairr latch fails to 
make setup, this latch can be forced to fail in the simirlator. 
Tire resuhing simulaled failure nrode should mulch the actual 
failure nrode. This is the pomt when the circuit debugger 
moves to SPICE as the main debugging tool 

SPICE is used to simulate the isolated failing circuitry in the 
appropriate failing conditions. In the latch example, the 
clock and data paths uito the latch nrv accurately modeled 
in an attempt to reproduce the failure in SPICE under the 
same conditions i}s on real sihcon. Differences are iirssumed 
to be either inaccuracies hi the modeling or mistakes in the 
root-cause hyiroihesrs. (.)b\ioirsly. these diffeieuces need to 
be ex|>laijied before root C4iuse is declarecf 

K the failure is frequency dependent, another' way to validate 
the hypothesis is to stretch the specific iiock phase during 
wiiich we beheve the failirre occurs. By stretclring a clock 
phase, we provide more time for the CPU to do the required 
work of that phase. For instance, if we have a speed failure 
related tr> one specific t>hase and we lengtlren that one pliase 
by 10%, tiie failure should get better. 

We continue to gather data and bMioibestze c^iuses of the 
failure mitil our hypothesis piisses the root -cause tegt* 
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Declaring the Root Cause 

Often, hy iaspetiion of the Tajlifig case in the switch-Ievel 
simulator or SPICE, sensitivities to kK^al rircuit t>eha\1or are 
predicted. This prediction can ilien be verified by targeting 
code to induce this beha\1or For exanipje, we niay try to 
precondition a particular bus with a specific data pattern 
Kjch thai it no longer tjansitjons as in the failing case. Or we 
may change the instmction tinwig of the case such that two 
f^'ents no longer occur in the sanie cycle. If we can tuni the 
failure on and off (the Mght-switch test) by changing one of 
the kno%\Ti sensiti\1ties, we have a good underatandhig of 
the failure, 

Tluxjughout the debugging cycle, all facts and obser\^ations 
are documented thonitighly in a bug database. This database 
is used to drive experiments to fill in data where it is miss- 
ing or to explain obsenations. The entire weight of this data 
is comptired to the root -cause hypothesis for consistency. 
/\ny data point in conflict needs to be explained before the 
root cause of the bug is considered known. This diligence 
to the data has avoided many premature and wrong root- 
cause analyses. 

Once we have demonstrated a light -switch test and our hy- 
pothesis agrees wjrli all of Uie data, we aie at the root cause. 
We beheve we fully miderstand Uie failure. 

We then re\isn the worsi-c;w5e \ eclot analysis one final time. 
Even if our fiiuil worst-case vector does not move the failure 
into tiie operating regiont we may fix the problem anyway, 
because it is hard to determitie if a small process sliif! later 
in I his product's life ctmki nio\'e this failine close to or into 
tlie operating region. 

The next step is to Hx the problem. If W(^ can fix I be problem 
with only a change in the metal layers, the luniaround tinie 
for new chips is tnuch shorter. To verify that the propcjsed fix 
will nctually enminat.e fhe failure, we may do a FJB (focused 
ion beam ) exjierimerU, in whirh one of I he exislijig chips is 
Tnodified (metal lines are cut ;m(\ new ones are deposited) 
to include (he eluuige. The chit) i"* ehiuacierized before 
iuid Mter the FIB chtmge to delernune how the failure was 
affected. If the (liihue was t^liminaled. we have good ronfi- 
dence iti our fix, and we will pul our fix iiilo the jjlan ftjr the 
nexi CPU revision. 

Creating the Golden ROM 

After a bug has been clo.seci, its woi-st-case test is addeil 1o 
our ROM of all other worsl-i'ase tests that have ex|Josed 
bu^. We call tMs ROM the golden ROM. We use the golden 
ROM for tnu<h of our \Tjlume shmoo testing, to scive a mini- 
ber of puq>oses. It shows where the cuixent bugs w(Te found 
and can show how a fix or ceitaiji chip chiuacteiisfics could 
affect these fiiilures. It also lets us know il'a iaig has been 
n^ntroduced, which happens on occasion. As the golden 



ROM grows in size, it naturaMy gh^es us more coverage. 
Many of our tipw bugs are foiuid by running the o!tI bug 
code in our golden ROAL If a test case has important cover- 
age that we do not have in our tester screens, golden RtJM 
tests can be converted into broadside vector tests for our 
package screens. 

Updating the Methodologies 

When the r(x>t cause or problem circuit has been identified, 
it often luicoveiB a flaw in our design methodologies. This 
implies that other sunilar ciiruits niay be used on other parts 
of the eiiip but haven't been discovered yet. The aphorism 
""If there's one rat^ there are many rats** becomes our motto, 
A "many rats" investigation is laiuiched to finti a tool -based 
niethod of extracting similar circuits from the chip database 
and fixing them if appropriate. Quite often, the failing cir- 
cuitry- has a unique topologj' that can be searched for with a 
tool. Finally r this flaw in the design nu^thodo logics is docu- 
mented and the methodologies are updated. 

Conclusion 

We continue the electrical verification process until we have 
searched our niatrix of vaiiables — temperature, frequency, 
voltage, [jrocess, rUid test cases — and we can find no more 
f<iJiure.s (hat we believe could move itito the operating t*egion. 
This process spans multiple chip revisions, with each new 
revision fixing one or ntore failure mechanisms. This process 
ensitres tjie long-term quality of the product throughout its 
lifespan. 

hi addition J we analyze the problems that we found and inte- 
gral e the solutions to these problems into r>iu design method- 
ologies so riiat fuuire products can avoid the same [jit falls 
mid potentially reach lugli qttality leveb more quickly diaii 
previoLis products. 
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Solving IC Interconnect Routing for 
an Advanced PA-RISC Processor 



This paper discusses some important new block routing technologies that 
were required for the HP PA 8000 processor chip. These technologies are 
implemented in a new block routing system called PA^Route, 

by James C> Fong, Hoi*Kuen Chan, and Martin D, Kruekenberg 



The design complexities of taday*s microprocessors have 
^own significantly, with tlie number of transistors climbing 
to well over a milUoii, silicon die sizes larger than 1.6 cm^, 
clock speeds exceeding 150 MHz, and shod design cycles 
caused by competition. These issues create tremendous 
pressure on design teams and the tools tJiey use. The PA 8000 
CPU design team used powerful design autoniatiou tools to 
achieve their design goals. 

Layoiit of the interconnect nielal on the chip is one of the 
key c^omponents of advaiicefl clesigjis. It is vital to address 
the incretisingly complex lay t nit problem in achieve smaller 
die siZcSj higher performance, and qnirker time to market. 
Since Ihe early 1980s, IIP has been w^orking to solve tlie top- 
level \C iniereonncct problems associated with many of the 
larger HP-designed and I IP-man ufactnrcfl It's. This paper 
will discuss some importanl new block routing technologies 
that were required to implement the IIP FA 8000 micropro- 
cessor* These techntjkjgies me enibodi^^l in a new in-housc 
block routing system ralleii PA_Ruiiie. 

Buy or Build Decision 

Frequently we at the HP Int egrated Circuit Business Division 
(ICBD) are approached by HP design teams who arc about 
to embaik on the design of a new chip to be nianufaclured 
by ICBD. We are asked to enliance our routing technology 
to address issues critical to the chip's successful routing. 



This was the case when the PA BO 00 design team ap- 
proaciied us with some ideas for new features needed to 
take advantage of new technologies. Like most aggressive 
designs, they were pushing the limits of ever>^ technology 
where they thought they could get a significant retiun on 
investment. Block routing was one area they thought they 
could improve. 

Our existing block router i.s callecl HARP Hlewlet I -Packard 
Automatic Routing atid Placement). HARP liad been evolv- 
ing tor over a decade and liad some legac-y code that was 
becoming difficult to extend. 

Using customer surveys to complement oiu own knowledge, 
we did a del ailed analysis of various existing block routers. 
We wtuited to set^ how they address this new class of block 
routing problems. Design teams are hesitant to switch 
layout tools mil ess alternatives can be found W\s\ match 
their design reqtiirements well enough to Jiisiity the risks of 
switching tools and the cost of the new tooL not only the 
capital cost but also the cost of learning how to use the tool 
effectively. Using radar charts (see Fig. 1), we were able to 
determine that the less aggressive st>1c of chips represented 
l>y the PA 7100LC processor were well-siiitetJ for tlie existing 
II/VI^P system. The more aggressive style of cliijjs represented 
by the PA 8000^ however, did not map to m\y existing lilock 
router offerings. 
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Fig. 1. Radar charts showing (al the capahiliiies of HP's existing HARP block router and third-party routers and 
Cl>) the needs of less aggiTSsive chips like the HP PA 7100LC and more aggrt^sslve cliips like the HP PA 80lXl 
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Sigiiificajit rhanges in functionality and features usually 
email a higli anioyni of risk. On any design it is critical to 
managt' the* mk. Being an infent^tJ supplier, wc* are able to 
wurk nmrli more closely with our customers by giving llieiu 
greater %isibility and comrol of the risks invoh Hi. Mm lev el 
of access is generally not available wlien deiiling with Ibinl- 
party tool providers, 

ICBD. lis HP s internal chip supiilier, is in the btisinpss of 
maktiig aiul selJitig chips, not ttwiis, so we face stricter 
reqiiiremertis to justify any iiyemal tool devck)pmcnl. It 
is rarely cost-effective to build a router for a single chip, 
flawever, it has been our ex|>erience that the micropro- 
cessor chips have always pushed the limits of the technolo 
gjes and the more general ASK" chips folkjw later after the 
l>imi]),s ha\'e lieen sn*ooihed out. In ltM>kingat what was s|h^- 
cial ai^ont tin* PA 800(1 we st>otted several new H^chuologv 
trends tJiai nidically changed die block routing i>roblem and 
might be adcjpted by future ASIC chips. 

pji^t. fabrication jjrt)res«(^s me stalling tr) add mmiy mcsre 
layers ofmeial for iniercotmect. This chati|;e In^gins to inval- 
iflafe the l)asic mntiel iLsed by IradiliouEd lilock router's of 
sepaiate routing chiumels luicl blocks. Set oiul the Jieed for 
higher off-chit> cotmetiivity is forcing a chtmge in (>ackagiiig 
recbnologv. Solder biunj} [mckaging looks like rht* itifist viable 
im^ans of addressing that need. Solder bm up packaging is 
iilso being looked at for rediicmg packaging ctjst by tuf ninting 
chips directly to boards. However, haxittg solder hinup i>ads 
in die middh- of tlieihip bn^aks the tradilional block router 
moilel tjf t^lacing llie |iads at llie periphery (^ftlic chit*. Uist 
but by no means leasi is a geiteral treiul of wiriiig delays 
becondng more significatu than gate delays. Thus, the 
empiiasis (jfroulini; is switched from mini mi/J rig chip area 
to miniinizing iMti"n'oiaie<i delay 

Working with our PA nSdllu ciisionrers, we [rrtorirls^ied the 
featines and cauu* up with a uianaj^cab]e subset r\ee<led 
for 1 1 le m t < 1 1 1 e s u t ■ ce ss f itl , Wt^ i J i e n i i ri ' u lai e* I M i m >| k >sal i o 
build the I*A„ Route block routing systenu (iiven the time con- 
.straiitts and the ambitious ^(jals. \\v liad to takt^ iht* drasltc 
stct> of fn*e/jng the i^ki system, HARP, with iuinitnal sui^jiorl. 
We gol agreemeni I'roin all jianies lui die basis of sinmg 
sUiJpojl bom die PA 8000 dcvelopiMs. 

New Technologies Lead iu New Const rainf*< 

Tlie PA SIKKI <lesign h\un had deckled diat Iti lie CiHupetidve, 
the PA 8(M)0 chip would not only be* more aggressive in Its 
design, tisiiigsupersc-alar, out-of-order itis true I ions, but 
wouki also use a m^w [irocess and new packaging. It is not 
MiK'ommon for a micniprotTSsor to tise a new process, but 
I his time they were* moving frotn a tbree-mi^tiil-laycT pj tjcess 
to a five-^metal-layer process. In addition ^ the intTeast*d I/O 
reqiiirenuMils nldH* dcsit^n ruletl out ronveiilinnal parkagit^g 
'['he only ma (OH' |);icka^in?J[ a]>pjoacb a\ailabie was soldej' 
btimp t(*chm>lo^y. With solder bimij) technology, the I/O 
|jads are stjrcad across the wliole clu|> anrl are not just 
restricted to tile |)erit>hery. 

Analysis of the |)ossibilities for extending or ada[>1ing tlu* 
existing blcK'k roofer in the fJAflPsyslern showed that its 
basit^ design intduinns were ncjt well lualched wi(h Nu- new 
requiriMtvenls and could not Ium haii^ijed lo make full use of 
the mnv tetlmologies. HARP was basinl on the traditional 
chjuijiel roulitig paiatli^m, iu whkh there are expandatjle 



roudng channels betw een solid bk*ck.s. Tlie t haimel router 
was Imiited to routing in three nieial layers, while the new 
process hati five. Tlie solder bump VO pads <*trukl be any- 
where, but die blcR^k router could only atiach to ports on the 
edges of a bloc^k. 

Molher more iin|x>rtant rt*striction was dial die sokler hump 
port fmtne could not change tmd the t>locks could not mose 
vvith respect to the pads. Tl^en^ were iwo reasoiLs h*r tJiis. 
First, the b(jard to which the PA StHXl was to be connected 
hat! a relatively long lead rime, so ils designers C4>uld not 
wail for the chi[> to l)e compleietl before geithig started. 
Secondly^ the solder bumps used to connect to the I/O pads 
emit alpha particles. If the placement were cli*mged ?i(} that 
the jjads becmn(> idiivcident with sensitive circuitJT then 
mii>redictabie ciiruii behavior could result. 

These* riHjuiri'menis meant the blrjck router could not grow 
tlie cbiumels or move the blocks, a constnunt ftir which our 
existing bkn-k router liad only weaksuppoH. We jcnntly de- 
cldetl it was r»oi feasible toattemt>t to auionutte the routing 
of the f]i\h layer (jfmetJil, since it was overly cotuplitatcti by 
du' re<|uirentents of the solder bimtp 1/0 pads. 

In PA_Route we addressed as m^my of flu- new re<iuiremet)ts 
as we could in the lirne available. We worked wit It the 
PA 8(H)() design team to pick the most important issues to 
atklress. This cHJite thmn to twn m:\jor feattues: being able 
to use Wu" third and fouilh metal layer resources o\ih' the 
top f>f some of the blocks and being bettiT able to control 
the growth of the placement. 

Our teant is mA often gi%^en tlie time to rcHlesign our sysltnn, 
so we look advaniage of the oppoitunity to add some lotig- 
desh'ed catjabilities. We addtHl a more soplnsticated jiort 
kuui nei model, which we call /fi/K/,f/r'. With hjlint^c ae t ;ul 
destriibe lite electrical chmac-t ens tics of a poll and a net. 
Pieces of artw<)rk represtMiting a port, for exaint>le. can be 
conskiered elect riciilly cHjuivaleut (allowing siitcbing), elec- 
trically resistive (allowing <omu'( (ion lo one of many but 
without stiiching)t or t4eitrica]l> otHMt (s]n»ciryiug dint all 
pieces cjf artwork ntnst lie contiected). Foliaj^e alhrn-s the 
roitter Hi be tnore Ilexible in usiitg t>oi1s. since it uses this 
eJiH-tri*'al juotk*! f>f the poits and it allows for more complex 
ronling <if nets in a ciiannel. We also look Iht* tinu' lf> tise 
more advaiiccd sobwan^ dc^vivlopmeni lechnitines. We 
.switched our de^sign style* from structured custotn progratii- 
mhig in the Ada language to olijectoneuted firogramminj;* in 
the r++ language. This allowed us lo atlentpt more complex 
algfjrithmi^ and recuse existing comi)otieut UbraJies. 

The Building of PA Route 

Tlu* PA_R<>ule syslem is cmn|>osed of many t^ornprnnrnts. 
incJudiug a iit^tlist icaden an aEl\v<ak rentier (which inodels 
obstacles), a global routen a channel scluHlnler, and a 
detailed router. A %^iewer is nsetl to examine iruermediate 
ami final results. Kventnally I be rutwork is prodiicerl and 
then verifu <l. Evert Ihongb die design time of the* P.\ stJtltf 
is long i'raupared to most ASR; chips, we did not have time 
to rewrite tlie whole sysietn, so only three main parts were 
designed and imtilemente<l from scratch: the main database, 
tin- global riJUtti's o\ti -I he-block ^rid model, and tin* new 
over-tlie-hl<Jck delailed router. We leveraged ibt* rr^sl of Ihi* 
system fr<mi the olci HARP sysltnti with niodibcatior^s to 
itiierface witli ilu* n(*w* datal>a*te. 
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Global Routing — A Block-Level Problem 

For a given level within a chip hierarchy, the rauling plane is generally 
occupied by a number of blocks with parts that need to be connected by 
physfcal wires on nets. The blocks are usually restricted to be rectilinear 
in shape but are allowed to vary in size. The space between the blocks 
is generally reserved for routing and is usually subdivided into adjacent 
routing regions or channBis so that the routing problem can be solved 
with a divide-and-conqyer approach. Within each region, a certain 
number of layers are reserved for routing the signals at the given level 
Global routing is the first step in the routing process. Its job is to generate 
a routing plan in which each signal is assigned to a number of routing 
regions. The objectives for the global router are to achieve 10D% assign- 
nnent of signals to available routing regions, to minimize the overall chip 
size, and to ensure that the timing requirements of the signals are met. 

The global routing problem is generally represented by a global routing 
grapk which depicts the relationships between routing regions and ports 
to be connected. The edges of a global routing graph represent the rout- 
ing regions and the nodes represent the intersections between regions 
and the ports. The edges are assigned weights, which C3n be the dis- 
tance between two region intersections, the distance between a port 
and the nearest region, the cost for using a particular layer in the region, 
a penalty for switchmg routing layers, or a penalty for overflowing a 
region. Global routing is accomplished by implementing a lowest-cost 
path-finding algorithm on the global routing graph. 



l:^ 




R§. 1. Global routing graph. Regions are indicated by dashed autlinas. 



To minimize development tame, we partitioned our develop- 
ment team into t wcj parallel groups. One group started on 
the new database anrl started porting the old programs, 
whUe the other started inipletneni ir^g some of the new global 
router features m the old system. Wlien most of the old 
HAKP system wtis ported we ported the modified global 
router. This n>eam lliat the global router stayed in structured 
custom Ada code, which was the language used in HARE 

Database Changes 

Tiie capability needed by the PA 8000 design to route over 
the blocks required us to improve the cxpressh^eness of the 
underlying database models used in routiiig regions. Tlie new 
database allows us to model the obstacles and internal ports 
that we see in these o%^er-tl\e-block regions. Tlie advanced 
port and net models (foliage) we implemented also requhed 



significant cliaj^ges to the database. This not only allows 
us greater control and flexibihty in routing, hut also allows 
us to separate the act of global routing from the channel 
scheduler that calls the detailed router. 

We developed automatic code generation technology to 
transform a graphical model of the database into code. The 
code generation technologj-^ was extended to support, the 
C+-t- language and we began to work on the new input and 
output programs. With the database changes completed, 
we could begin porting the old PLARP programs to the new 
database. 

Global Routing 

The general global routing problem is described at right, 
PA^Route incorporates a global router that understands 
rectangular blocks. Tlie global router needed to be extended 
to support L-shaped blocks for the PA 8000. An L-shaped 
block is cut either horlKontally or vertically into two 
rectangular conu>onenls and special control is imposed 
on the channel between the cut components to keep the 
components linked together. The routing plane is divided 
into rectangular routing regions that nieet only at T~ln ter- 
se ctions such that, only two sides of a routing region have 
constrained ports. This somewliat restricted routing model 
comes from a conscious decision to avoid situations in which 
a routing region becomes a '^switchbox" with polls on all four 
Sides constrained to fixed locations. The more constrained 
switchbox routing problem generally requires more run 
time, creates more constraint cycles, demands clever rip-up 
and reroute strategies, and tends to leave more shorts for 
manual repair. We opted instead to concentrate our effort on 
provlfiing more flexibility in the PA_Houte global router for 
meeting user requirements and for achieving the smallest 
possible overall chip area. 

The old HARP global routes like any traditional global router, 
asstimes that blocks are black boxes, that the points for 
comiectlons are on the edges of the black boxes, and that 
routmg is confined to the channel areas between the blocks. 
That is, all routing resources inside the blocks are dedicated 
to the blocks' internal unplementation only and tlierefore 
routing at the global level is not allow^ed to traverse through 
the blocks. This simplistic assumption was largely accurate 
in the days of two-layer and three-layer IC processes. With 
the advent of the HP CMOS 14 process, which can have up 
to five routing layei^, the assumption that routing resources 
inside a block are dedicated only to the block is no longer 
realistic. For the PA 8000, a good amount of metal '^ and 
metal 4 resources inside some child blocks are available for 
routing global nets. 

Being able to use such over-the-block routing resources can 
lead to Induced signiil timing, decreased channel congestion , 
aiKl ultlnrntely snMer overall routed chip size. Having judged 
that over-tlie-block routing was a critical factor to the success 
of PA 8000, the PA_Route team midertook a revolutionary 
cliange in the global router to support the routing of global 
nets over any block, pro\ide(i that t here are routing resources 
available over tlie block. The traditional gkjbal rotiting graph 
was augmented with a \irtnal grid model ox er each child 
block, a sophisticated net flow optimizer, and an efficient 
routing resomce estimator The giid model alloAvs the lowest- 
cost path of a global net to traverse through any region over 
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a block as long as there are Gee routing resources. The global 
router builds a detailed model of routing resources in each 
region (chaiuie! or block) and tracks free spaces in the re- 
gions based on a sophisticated density esliniator thiit under- 
stands obstacle*s. The net flow optimizer minimizes jog^ng 
and dL^tributes una%oidable jogs of different nets to different 
regions to reduce congestion. For connecting lo tlie new 
solcfiT tnimp L'O pads, which are inside some child blocks, 
die new global router was extendtMl lo supi>on pons inside 
any block, Hlih the restrictions that the ports be on selected 
port layers and ihai there be availalile routing resourc^es in 
tlie block. Tlie glol^al router takes care of avoiding obstacles 
and pons on the e<tges of a block when inside ports are 
brought out of a block to form a lowest-cosl path. The 
net flow optimizer also plays aji important role in choosing 
an optimal exii point for ihe inside port so as to reduce 
umu*cessar\' jogging. 

The i^rtHleiemuned solder bump I/O pad h>calions for the 
PA 8000 Ton c the placement to be iiripeninbed dining rout- 
ing. This Ls a h;mi probleni for tlie global rourer. Not havirvg 
the luxury of actually routing the nets during global routing, 
utilization of netting resources over the block as well Jis 
in the cbimnel regions is con t Rilled using a close estimate 
of detailed routing. A re;Lsonably accurate and fast density 
estimator was int^fu^iorated into ttie PA_ Route glolml router. 
Since routing over the block is allowed, the density t^slitnator 
miLst imdersiand prerouting and obstacles, A density check 
ph£ise was introfUiccd after the evuluatiort of the (uwest-tTjst 
imth of a net. If the path would exceed the rout jug <'a|>aeity 
of one or more regions, the grid model in the glob;il routing 
graph is modified to forbid further routing through the con- 
gested jioiltoii of the regions aiid mi iilternate lowest-cost 
path is stuiglit, 'fhis cliecking ]:irocess is repeated until either 
a clear [lath is found oj^ tio cleai' |>ath is found, in wliich case 
the net is left unrouted to preserve the fixed placement, 
hi adtlition In [HMforming at curate density calculations, the 
globiil roiileralsrj atlemjits to acliievi' mininud placement 
perturbation by anttnnaikally assigning nets to the less- 
dettse layer wherever there is mort^ than (me routing layer 
available, and l)y optimizing net How at regi4>n interfaces to 
reduce routing congestion. 

For muliitiiri uHs, for wliicli couneclivity can have sigmfKanl 
lierfoiiiumce and density impljcaiiori.s, ptnt foliage wasackkM^l 
lo the PAJloute database to give the gltibal router a niodt4 
for determining jjorl equivalency, wliile net foliage w^ls in- 
troduced to allow tlie giolHil router tt» generate more sophis- 
ticated t^hysical ^ omnniivity for (lie shortest t*ath. Tltis rom- 
bination ofpnri and net loliage results in a high degree of 
contrcJl o\'er the i>hysic^al connectivity tjf a net. A designer 
caJi specify f*jliage explicitly or allow the PA^Route global 
routtM' the rreedom to oi>l]mize the gloljiil route l>y creating 
foliage as necressaiy to mijiiniize the total wire length and lo 
avoid congestion. 

Over-the-BIoek Routing 

To handle two iinp<inant aspects of the PA 8000, a new over- 
the-block detailed router was refpiired. The router had to 
luindle obstat les in any nntting layer and it had In lie abU* Icj 
connect to polls locateil anywhere witliin Ihe routing regttJt!. 



Detailed Routing Methods 

Detailed routing has generallv evolved out of four basic approaches: 
maie routing, line probe toulrng, left-edge rautmg. and greedy channel 
scanning The problem is formulated as a routing area comaioing con- 
fiecnon points or pins on a rec* ally rectangular) region of 

ch^nne^ Pms can be located o four sides of the legton or 

within Ihe region The connection points are generally constrained to 
feside in certain layers to make them easier to connect to. 

Even single- layei m^Jting prolilems are NP-complete. wftich means that 
an optimal solution cannot be achieved m a reasonable time For this 
reason, detailed routing solutions are heuristtc in nature The factors in 
determining a solutions usability are the number of termmals. rtet width, 
yia restrictions, boundary shape, number of layers, and net types such as 
power, ground, and clock wires 

Maze routers abstract the channel routing problem with a grid-based 
model Wires are restnctad to follow paths along the grid lines 

Routing is accomph sited by laying down wires on the gnd one at a lime. 
Obstacles are modeled as disallowed portions of the grid. Therefore. 
maze routing can handle arbitrary obstacles. 

Line probe routers scan in the x and y directions searching for line seg- 
ments from either the source or the destination, Scan lines do not project 
beyond obstacles, so obstacles are avoided by a subsequent probe of the 
line segments orthogonal to the ones from the previous pass 

Left-edge routers sort wires by the boundary formed by the leftmost and 
rightmost pms. It orders wires one at a time using a greedy method that 
places segments into tracks. It filts tracks nne at a time, packing seg- 
ments to minimize unused space in a track. The route is complete when 
alt wires have been assigned to a track 

Greedy channel routers divtde the channel into horizontal tracks and 
vertical columns. This approach works on one verticaf column at a time. 
scanning from left to right. The approach is termed "greerfy" because 
each column is optimized individually, although the entire channel is not 
guaranteed lo be optima! The greedy router sweeps from column to 
column, trying to join segments of nets assigned lo multiple tracks The 
greedy channel scan is capable of providing fast solutions but cannot be 
easily extended to handle arbitrary obstacles. 

The over-the-block detailed router used in PAJoute uses a completely 
different approach based on a graph. The graph represents horizontal 
and vertical constraints of the wires. 



Restrictions tm (.he topmlogy of the obstafic^s aiid ports were 
nrgoTjaUxl wilh ihr PA HiM) (\vs\gn team to relax thp con- 
straints of the ovcpl he-block router to lueel aii a^ressm^ 
sclreciiile. 

flhiki bloc ks were fonstnicted by k)wer-level composition 
teams. Their design used the lf>wer layers of metal lo pei'foriT^ 
local interconnect and the upjjer levels of metal tor inter- 
mediale levels of intc*r<'Oi\neei , The result wi\s thai pailially 
useci layers were niatie availal>le to the over-tlie-block router 
to coniplete tlie intemiediate and global level mterconiiecr,, 
The overthe-bloek rotitcr was f^'iven the res|mnsibility of 
avoiding ;irtvvtjrk treated by ttu' lower-level r oni[>ositi(jn 
Wiunn. 
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A review of existing detailed routing algorillutis is presented 
on page 43, 1'fie PA_RQute over-the-block router is built on a 
new routing aJgorittuiL It is baseci on a ctiafinel-like paradigm 
although it iiandlos obstacles with arbitrary configurations. 
Layers are genenilly assumed to njn in either the x direction 
or tiae y {iirection. Wires are routed tissuming one direction 
is preferred, that is, in tlie preferreci dii ection the wiie nuifi 
for a longer dbstance and canies most (jf llie current betv^'een 
a source aiici its sinks. By handling oi>stiiCles in arbitr^ury 
configurations, the over-the-block router extends channel 
renting concepts into an area-based routing regin\e. It re- 
tains many of the benents of channel routing while being 
flexible enough to hancile more complex rontiiig topologies. 
Tins type of routing inedu.>dology \\ill become n^ore common 
as more layers are made available for routing. The over-the- 
block router can coimect to ports not only on the sides of 
routing regions, but also in the micidle of a routing region. 
The ovepthe-block router sup]>orls variable wire %vidth and 
si>acin^, which gives die designers greater control over the 
tiinuig delays of a signal. Tlie over-the-block router reads the 
layotit iTiles directly and does not abstract them into arbitrary 
routing constraints. Unhke other algorithms, our proprietaiy 
over-the-block algorithm does not require ^ires to be 
"bhmetr according to their width and spacmg, and it does 
not rely on a compaction process to acliieve optmiai density. 

The over-the-block router contains the features needed for 
higli-speed, performance-driven critical designs such as the 
PA 8000. Jt handles complex via structLues necessary for 
liigh-perfonnance designs by allowing the hitersection area 
to be exi:»aJided beyond the size dictated by individual metal 
comiections. Wliile it supports a Ingh degree of manual con- 
trol the over-the-block router is itlso retisonably fast, making 
multiple design turnarounds feasible* 

Tlie over-dre-block router models the routing prt^blem as tW'O- 
dimensional line segments that represeni tlie largest vviring 
component: of a [let. This hunk is assumed to cause most of 
the parasitic delay and the overall goal of the algoritlnn is to 
find an optimal ordering of these tnjnks to generate a dense 
packing and avoid obstacles. Each tniok iuici each obstacle 
becomes a node in a gra]Dh. Tlie edges in the graph model 
the vertical pin constraints of each w\re and the homontaJ 
constraints of that tmnks placement relative to other tnmks 
(see Fig. 2). The total graph contains the weiglited con- 
$ti"aints of all tnmks in the routing region. Thus, each tnmk is 
considered for placement during each phase of edge direc- 
tion assignment, and die net ordering difficulties of other 
routing schemes are avoided. The general natm-e of the edge 




South Edge 



Fig. 2. To thp ovcr-thtj-block 
detailed router, each wiring 
trijt\k la a node m a graph (a). 
The edges in tlie graph model 
Live verlical pin eunstraijits of 
each Inmk and the horizontal 
consLrainls of tliat trunk's place- 
ment, relative to ol".her trunks, 
(b) Routing plaii. Siiadi^s ofgj'ay 
represent dilferent metal layers. 



selection billows other constraints such as cross talk and 
delay to be modeled in future versions. 

The algorithm can handle any niiitiber of layers and is not 
rigidly required to follow layer-per-direction constraints for 
verdcal coniponents (i.e., connections to ports) or tnmk 
componenls. UTien constraints occur, the over-the-block 
router tries sever'al schemes to alter the topology of the wire, 
such as remo%nrig the constrauits. The scheme includes jo^ 
insertion and ivrong-skle Hegmejiting in vailoas forms. 

If the over-the-block router cannot complete a route, it pro- 
duces a spacing violation or shori circuit along with associ- 
ated fli agnostics and completes the route. When this occiuns, 
the user has the option of fixing the shori inarujally or alter- 
ing the routhig problem for the region by such methods iis 
j*rov^ing the placen^ent, constrainiiig the globiil roiii ing with 
capacity controls, or other means. 

Althougti si)ecialized to hmidle the routing problems of the 
PA 8000, the over-I lie-block router was built to handle the 
general channel routing j)robiem. No shortcuts were taken 
that would compromise robu.stness for tlie general case m 
the exiJectation that the router could be leveraged for other 
designs. 




Fig. 3. VA 8O0O CPU chip vAih highlighted are^ showiiig 
wherr PA_Rout,e performed block-level routing. 
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Conciuslan 

A new blfH'k nHiTiiigsysTom vbIUhI PA.Rnutc was biiill spc- 
i-ificaHy to addix^ss the nerds of higlviK'^rfoniiiiJU'**. leatiiiig- 
i*flgo IV designs. PA_Ronte contains .sigiiiriritin featnrc"! 
built (m a*nv Tt*rJ»jiDlij|4> while k'venigifi|i esdstiiig nttle lo 
niiniinize risk, it wasi dc^igntnl lo tw exitnidable in addn»ss 
finun* Lsstu^ as they arise. It was iiseii siicet^sfitlly Ui route 
tlie PA iStMM* rhip and did ik»1 impa^i its M-hetiuk^ df^pitt^ the 
higli levels tif tisk »iv<»lved- Fig. '4 showA the ari*as t>f Ihe 
PA HiMWI rhi|» where f'A_Hoiile (M^rfonned black-level routing. 
The features; ajul liniitiMioiis of ihesyslem were carefully 
rk^ignt^l w ilh clo.se cooiM*nition liei ween the PA .S(MK1 design 
tearn aitd thc^ C'Ai> develoi>nient leiim. Miiny LilteniativeH 
were anal>7.ed using desigji -critical issuers as Iht* nieasure- 
rnetit < nteriiL Balanting immediate and future ehii> design 
neefls wa^ given liigh imi>ortaiHe in tite design tif PA_Route 
St i I hat the sy^stcnn can conUiiue to be used for Future clesi^is. 
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Intelligent Networks and the HP 
OpenCall Technology 

The HP OpenCall product family is a portfolio of computer-based 
telecommunications platforms designed to offer a foundation for 
advanced network services based on intelligent network concepts. Tiiis 
article concentrates on the HP OpenCall service execution platform, 
service management platform, and service creation environment 

by Tarek Dehni, John O'Connell, and Nicolas Raguideaa 



Intelligeiil ir'I works aro an expaiidiii^ area within the tele- 
commimk/ations industiy The aduplioii of inlelUgent net- 
w^ork IffluKJlog^' has been driven by iUs ability lr> allow tele- 
comnKini eat ions netw'ork operators to install and provision 
npw^, reveniu^-gem^rating roniim miration ser\i<?es in ihejr 
nerworks. With ihi^se senires instailtxi within Oie network j 
the extra fiinctioiiiility they provide can easily and instanta- 
neously be made avtulable to the whole f "nstonior base. 
Exani|>les oj yueh ser\ices aie the IreeijtuHie servic-es (the 
cost of the te lei shone call is |jaid by the t ailed [jarty), credit 
card calling, ai^d the C^LASS seiVices (custom local area 
si^ialmg semces) in North America. 

At the ^anie time, the staiKlmdizal ion of some key interfaces 
w-itliin the telecommimications netxvork has ailow^ed greater 
competition between network etttiipnient providei-s, offering 
the possiblHty of genuinely inultivendor networks. The 
opening np of previously proprietary' switch interfaces Iras 
made it easier for netw ork operators to a<id new fLtnctionality 
to their networks, since tins functionality can J>ow^ be imple- 
nienled outside the switch, often on industry-standard com- 
p!iter platforms. Today, with the emergence of new fixed 
anxl mobile network oi>eratoi^ in many areas of the world, 
two new cbrivers for mrelligent networks have emerged. First- 
ly, there Is t!ie need for interoijerability between these net- 
works. Secondly, o]>eral:ors seek to differentiate themselves 
on their serviee offerings. Both imply an e%^en stronger re- 
quirement to support extra intelligence i[i the network. This 
will ensure the continued demancl for more open and flex- 
ible mtelligent netw^ork solutions. 

Hewlett-Packai'd's product strategy for the intelligent net- 
work market is based on the IIP OpenC'iill product tanhly, a 
portfolio of computer-based telecomnumications tJlatfonns 
designed to offer a sohd foundation fi^r competitive, revenue- 
generating services based on mtelllgeni netw*ork aicMtec- 
tares. This article concentrates on the HP OptMiC^all service 
execution pia ffo rm , senure ma n tujnn en f piufjunn , and 
s('ri)}ce ermtioii environ meni, with particnlar emphasis 
on tlie architecture and design rjf t!ie senice execution p!at- 
fonn. The HP OpenCall SS? pffitjortn is described in the 
article on page 58. 



hi this paper, w-e introduce the key concepts in intelligent 
networks including the role ofstimdardization, wo explore 
the system requirements for a class of intelhgent netw^ork 
elements (those elements targeted by the IIP Openf all piat- 
foons), and we highlight 1 1 te key aspects of the design of tlie 
HP OpenCtill platforms. 

Intelligent Networks 

TIte telephony service is very simple. Its objective is the 
transport of speech information over a distmice in real time. 
Telephony networks were originally designed with the 
assnmiJtion that the same seivice would be offered to all 
users, and this held true for a long time. Users could select 
<jne {}[ a ranj^e of destinatitnis and be called by other users. 
Over the years the focus of teletihor\y seiviee pro\idei^ has 
been to improve the tecluiology to offer these basic ser\1ces 
to a larger number of customers, mrd over' longer and longer 
distances. At the same time, terminals have become ntobile, 
with mobile i>hone users demanding the same levels of 
senices. 

As a consequence of this evohitlon, today's telephony net- 
works (orisist of a mix of more or less integral etl teclinoio- 
gies and networks that have been deployed over inure tlian 
30 years, forming a very comfjlex and targe-scale global 
infrastructure. 

ho bis context, the task of provisioning a new sei-vice in an 
ojierators network is extremely complex and may vaiy con- 
siderably depending on the netwf>rk infrastnicture. The con- 
ventional method of continnidly integrating these new^ func- 
tions hilo pnblic exchanges is costly mid lacks flexibiiity', 
making it difficult for network operators to compete effec- 
tively m aTi increasingly cotnpetitive environment- 

This situation has led network operators tmd their suppliers 
to locjk ffH" a better approach, m which control functions jmct 
data management linked to the creation, deployment and 
modification of sei-vices can evolve sepanitely from the liasie 
switching or existing functions of mi exchange. Tht^y as- 
signed stm^dards organizations (ITl'-X F/fSl, BelU'ore) the 
responsibility of defining an architectural friunev^'ork for the 
creation, execution^ and management of network services. 
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Informstfon Flows 




SfrFvicfi Plane: 

> Esiraction of service features 

(Q ctarify sefVTce requimneirts 
* S = Service 
■ SF - Senrice Feature 



Glabat Fuftciional Plane 
« Service execution description 
making use of SiBs and 8CP 

• 6Cf = Basic CaH Process 

• Sli = Service Independem 

BuiMtiijg Block 

• POI = Point o1 In iliatic^n 

• FOB = Point oi fietom 



Oisfribiited Functtonal Planer 

• Basic Call State Model IBCSM) 
models basic call process 

• Desc ripl i n af se rvi ce exec uiio n 

• F£ = Functional Entity 



Ftiyslcal Plane: 

> Physical implemenlation 

s€cnari^s fur intelligent 

network iynctianalitv 

* Define protocal INAPUnlelligent 
Network Appticalion Prolocolj 

• PE - Physical Entity 



Fig. 1. Tht> ItneUlgeiit Nelwork Cone epluul Mo<l(4 of Ihf' rTr'-TiSit 
rnjrni'vvork fur desrribiiig aiui spfcifyiii^ inli IllMf^nl tn'twurk systfthH. 



Iiiteiiigeiit Network Conceptual Model 

Ttu' ITl T Unlenutlioiial Tek'tonmuinit atifjns I 'nitin— Tele- 
t(nvnniiriu aliens StiijidardizaHun St^'fur) ilevHoped the 
itifptlitjenf Network Cona^phiai Model to provide the frame- 
vvr>rk for the desc rijjlmn of infolligerit network cfmCTpts lutd 
iheir relalions. The Itiielligrni N(iw(jrk CoiKejiriial Moilel 
consists of four phuies. each of vvhith is a thfferoiit ahstiac- 
lion of the teieeomnimiications network. The ITL i-T also 
planned the* specifit*ation of the target intelligent network 
arrhileeliire through several sliitly pt^tiods, tlierehy enahiing 
iner el mental implenjentalions. These siKt cssiv*^ stanUanli/^etl 
html ions are refeiTed to as liiteUlgeat nftiifirk lupuffflihj 
sf'fs (see Pig. 1). 

Service Plane. The service pianc tiescrihi-s seiTit es anci the 
serviee features as j^een from a iist*r persjjective, A service 
fealnre is tile smallesi f)arl r»f a seniee ihat t^iin he per- 
ceivtnl hy a user. The .serviee plane dtjes iujI eonsick^r how 
the service Ib implemented or provisioned in die neiwork. 

Global Fonctioiiat Plafie^ Thv tffahfil puifiifitiftl phntr dc- 
strll>es die di\sign uf services as a romhitialjoti i}\serni('e 
indeimident huHfihtg Mocks. Service indepentient tniilding 
hItK'ks give a niotlel at ihe network as a single endly. Ihat is. 
tht^e is (Kj consiiliTalioji oflitiw ihe rutuliotudity Is dlslrib- 
Lited ovcT the iK^twfjrk. 

A specillc senice indepentieni huilding block is the hffsir 
vail jiftjcess, which conesp4inds lu tlit* [jasic call service. It 
haspohtis of initiotion and points of tviurn. An instaiice of 
a service* Ifigtc can lie called from a point of initiation, an<l 
after execution of ihe senice logic, Ihe basic call jnocess 
is recallefl in a point of return. Service logic cxjn'esponds to 
setviees f >r service feat n res hi \\w senice plrme. 

Distributed Functional Plane- The dhtributpdfunctiotml 
plane (Pig. 2) gives a tlistrihnic^d functional view of the net- 
work. FmHiionaf fnffff's are grotipings of functionahly 



that are entirely contained in a physical entitv In other 
words, they (anncjt t>e split among sevend phv^iral entities, 
TTie distributed functional plajie describes the funedonal 
entities together with their relationstdps. 

The identifit*d functional entities are as follows: 

• TIm^ (fill rfftihol acrf'Msfiinrtitm tJioilels the interface with 
die end-iLSc^r temiinaL 

• The ntlJ rontfxilj'ittivtimt t>rovides eaii and connt*ction 
contToK that is, basic call processing. 

• The servifT sHiUith/y ftititikyn models die call control 
fruiction as seen from Uie senice control function. 

• The*'en>/cf vontnyi Jvnrfion provides ilie logic and process- 
ing capabititif^ for intelligent nenvork-provided seniees. It 
interacts v%itti the st^rvice switching function to modify the 
behavior tjf the basic calk atid with the two enrities below 
to access additional logic or obtain service or user data 

• Tlie semnee datu function contains customer and network 
data for real-time access from the senice control func lion. 

• The A7>fY/Vi//c:fY/ resonne fitficfhffi provides adt Ml ional sjte- 
cialized restiurces required fnr intelligent nc^twfjrk-ijrovided 
services, such as dual-tone Jiiultifrequency ( t)TMP) receiv- 
erSj announ cements, conference bridges, and so on* 

Finally, on the ntanagenteni side, three fimctional entities 
are cJefined: 

• The sert ire ot a fiatjefnetit fn net hvt shows for dep loy me nt 
and provisionhig of inielllgent network seni<'es and hir 
sup|>tjrt of ongoing operatioiLs. Its numagenienl domain c;in 
cover hilling and statisdcs as w^ell as senice data 

• The senyice oval kui ni ri ro n m enl fit uei tan allows new 
senices to be safely ani:l rajndiy tJefined, inti>lenieniefl. and 
tested ht*fore deploymenl. 

• Tlie,serr)re mumigemenl aeeemfumiiftfi jjrovides tlte inter- 
face bi^tween senice managers atuJ die serviee managenu'nt 
function. 

U is envisu>ju*d thai die s<*ni(T' uidepertderti hoik ling blot ks 
specified iji thi* global functional phuie will bt^ realixed in the 
distrrbuted fnnclional |>lane by a sequence of cot ordinal et I 
actitjns to be performed tiy various functional eiilities. 

Physical Plane. The pbysicaJ phme describes the physical 
alternatives for the itnt>lenu^tttation of an intelligent network. 
Tlie itlentiHt^i ptjssible physical nodes It it huh ■ sereire euiftrol 
pitiftls, sn'tlekf's, and nrUiHtfeut pf'tiphenils. 
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Tlie iiifumuKitjji Ouvvs In'lwepn fuiKfirMia! rtiiities ronfaintMl 
ill scpanih" i^jjysiial (Mitilit s imply a nvv<] In spet ily and lo 
slantlariliiit^ tlu* iiilinfiu'(\s and |m (JlnrtjLs liotvveon f lu^so 
separate physital enuiies. 

The following ])ro1oe()lK have bt^i'U iltvfiiietl by I he ITl --T: 

• Tlie ISDN rser Pail f ISUP) ai\d the Telepl^rjiiy User Pari 
(Tl'P) ajr inHtiuilialiniLs of t lie informal ion fltjws lieiwfvn 
eail rontn>l fnnclions. 

• The hilt*lligenf Network Apfjlieatir^n Prolocol (INAi*) coverj^ 
the information lh>w^s belween senice switchhig funclisjiiH 
wilh a senes of message sets. 

• Thu inlnnnalion How^ hel ween the service^ eonlrol 
fiiia ( J4Mi aiul Ihe .st^'x'iee tiala fiinetion is ba^sed on the 
X,500 siKM illeanons, 

Exeept lor the TUP, fhese protoeoLs are netwoik seniees on 
top of the teiephone ronipmiies' Si^ialin^ System #7 (SS7) 
si^n al ing net w < ivka. 

Inielligenl Network Rollout 

The general ai-chitertuj-aJ concepts of intelUgent networks 
are applit able to a with' nmge fjf telHcnmnninifaiions net- 
works inrhnling plain ultl telephony Si^mce;^ { p! > fS) net- 
works, mohik^ eomniimicatiun net w< irks ((rSM, PiN. I>f](T)* 
ISDN, and future^ broadband networks. Furthermore, these 
well-defined architectural princit^les can also be found In 
standards from other rtrganizaiions tlial ha\ e defined etiuiv- 
alent or domain-speeifii arel liter tores. BelK ores AIN (Ad- 
vances 1 h 1 1 e I h ge n t N el w t>r k ) ar c lii t e 1 1 it re s haies many fea- 
tures of t lie ITU-T aiiproach, while ETSI, for example, htLs 
klent.ified Viirious physical tiorles (IlLR, VLR. MSC) comniu- 
nicathig via the MAP prolocol in mobile rietwoi ks. 

Although tl (hdn't detiver all of its original promises, tlie 
inteUigeut netvvork coiurept is considert^d a snt (vss. Today 
freephone remains the m^jor revenue-eaniing semce for 
Intelligent networks, and it is continuing lo gioW'. Fieephone. 
split -ctiarge, and [n'«'niiviin rate services still generate almost 
5f)'Xi of intelligent network rrvema^ Anothrr senice that is 
prcMdiiig signitlt ant revenue lt> network o]>erators is virtual 
private netw^ork senice, whieh is ciuri^ntly experlenchig the 
most rapid gTow1:h. 

It Ls uiteresting that all of these sen ices share the character- 
istic that they require data to be available throughout a net- 
work. This class of ser\ice cleaiiy pr<»motes a centralized 
intelligent netvvork solution. In fact, the funilamental sni - 
cess oftlie intelligent network concept is tiiat it has si nulli- 
fied data niimagt^ment of t hose seniees for which it luis 
SLKf/eeded (although semce management remains a rela- 
tivt^ly tuinor ronsideratioii in standards organizations). The 
full polentiaJ of otlu^r \yi)vs of inteihgent network services 
stUl needs to be reali^ced. 

Intelligent Network Eletiieiit Requirepients 

Hewlett-Packard has developed tin* HP OpenCall semce 
execution platform as ;m open, inogrammahte. scalaljle, 
highly available, and eiisily nianageal)le t^latform that can l>e 
used as a basis for implementing a range of difttMeni ele- 
ments of an intelligent network. The [ilatfonu pnn'idis a set 
of basic- functionalities that ai'e common to many intelligent 
netw^ork elements. By installing sidtabJe user-defined appli- 
cations or seiTices rai it. the HP t>penC'all platform can be 
extended to pnnide the service conlTol fvmction, service 



data fmulion. sperialized resomce rnivfilon. and othej fnnc- 
Moiiaiiiy to other nodes ol the SS7 netwrnk. sueh as swilches* 
Thu.s, tiie aim cjf the HP OfienCi^iIl service execution j ilatfonu 
is til provide a platform that can t>e easily extended in nieH 
tlie requirements of different intelligent network elements. 
The ( oniin<jn icqairenienls of these dilfercnl network 
elements ale summarized in the following ]jaragraphs. 

Openness and Flexibility, t mr uf the key goiJsof llie intelli- 
gent network is to pronuite luultivendor solutions to allow 
technology from tlilTeient equiijment providers to interwork* 
In contrast, many of the early inteiligem network sohitions 
were implement e<] on jiroprietaiy solutions. These applica- 
tions ait> often tiot ijurtabk' aer'oss platforms, so custtimer^ 
are often tied to a smgle equipment pro\ider and caiuiot 
always benefil from the latest advances in hardware. 

Fun hem lore, intelligent networks are seen as evolutions of 
exisling networks, that is. new network elements, imple- 
meitring inielligenl NiMwoik hnit tifniality; are exjitn ted in 
hitenvork with existing etiuiixnecit. This iini^lies that the 
new elements rpust support multiple ent ry polnLs to ensure 
easy integration with other produfls. both at ffie SST network 
interface and at the interlaee with thi' opt^nilious support 
systems that manage die fele|)hone net\soi k 

Tliis need for ntiilti vendor sohitions also drives the stan- 
thmiization activity in intelligent net works (see "Hiandaid- 
ization — A Phased Apijroaeh" on page 4P). hi thc^ory, if the 
interfaces betw^een network elements are standardized and 
clt^arly tk^fined. inli'nvorking shonid be easy. However, de- 
spite the existence oi inultipk^ standards, loral differences 
make it necessary for a platlbrm to adapt to many different 
en%1romnents in tenns of connectivity, protocol siiiJiioii, and 
managenK^it. fuaihennore, there is no stiuidard i^nvironment 
in flu leli'commnnieai ions ('(filial oftlte. Ivuh eenlral oltiee 
often has its own collection of legacy systems. This implies 
a neefl to be alxle to adil network-siiet illc, prottKobspeeifie, 
an d en vi r oni n e nt -S| >ee i f i c i n t e 1 1 i ge 1 1 < e 1 c j m e et c 1 1 st om e r 
recinirements. 

Rapid Service Depfovment In the increasingly eoin|>elitive 
teleeoinmnuieatjoiis market. netw<nk operatoi^i see die 
need to differentiate themsehes on tlieir seniee offerings. 
Operatoi's want to be able tcj define, deploy, and customize 
services quickly, w'hile still guarantc eirig the high levels of 
t|uality and reliabihty dial have traditionally been assoeiated 
with teU'conimunitalions networks. There is a need to sup- 
poi1 all ;isi>ects of the senice life cycle, intduding definifion, 
validation iisstallation, and niJimagement. 

Perlormance and Detennfnism. Tlie rriund-trip real-tune bvKiget 
from a switch to a network element such as a service eonti'ol 
point is typkaUy quoted as 2^0 inillisecomls (mc^aril with 
about 95% of responses within otJO ms. This includes the 
switch processing time, the transmission and queiRug times 
of both the request ant I respf>iise messages, and the sendee 
control fKnnt iiroeessing time (eneodingand decoding of 
messages, service aeiivatiou and exeeutioe, iiuer\ to data- 
l>ase). Clearly, the faster the transmission links and ihe 
smaller die buffering in the system, the more tinie m avail- 
able for senice control point processing. For a sin^ple free- 
phone seniee, once the SS7 transmission tinn^s are suli- 
traclcd, we obtain aretiuirement for a meanseivice conirol 
point processing time tjf 50 ms with y5% completing \\ iihin 
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Standardization — A Phased Approach 



Because of the complexity of tnieSSigerii ntiwofks, Ihe nymbef Qi unre- 
solved tecHncai issues, and the significant financial (nvestmems, the 
clevelopmem of an intelligeiTt networit afchftecture supporting alt pos- 
sible telecommunications services and technDlagies, called the target 
intelfigenr network, wiif TBke mrinv vf^ars St?*nrtarr!"73t«an bodies heve 
chosen to adopt a pnase 

that takes advantage of u.^ ^ ^-.^ -.,.,^- .^. ... .^ : .:' 

time and that guarantees backv^rd compatibiitty between ihs drfferent 
phases, 

Tlie fmemational Telecommuriications Union— Tefecommunications 
Standardization Sector IIIU-TI has addressed this phased approach in its 
Recommemiation Q;1211 

Table f shows the ditferem phases in terms of capability sets and their 
descriptions. Eacli capabjiity set gives a set of definitions of capabilities 
thai are of direct use to both manufacturers and network operators, 



Table 1 

Phased Approach to the Target Intelli 


igent Network 


Phase 

{Capabititv 

Set) 


iTU-T 

Recom- 

mendatian 


Time 
Frame 




Description 


CSl 


0.12k 


Finalised in 
1395 


First standardized stage 



CS2 



GS3 



CSn 



Q1Z2X 



ai23x 



Tobeffnal- CSl-compahble Handling 
ized m 1997 of multiparty calls 



Work 
started at 

end of 1998 



CS2-CDmp3tfble Handlnig 
of broadband aspects and 
integration with the TMN. 

Evolving lowards target 
jntelltgent network 



TMN^ TBlHfrtirnmuJwrsfmns Managftmenl Neiwork 

Capability Set 1ICS1) 

CSl IS tJie first standardized stage of intelligent network evolution based 
upon the existing technolagv- It is a subset of the target inteHigent net- 
work architecture CSl defmes the functional entities fsee Fig 2 on 
page 471 and the interface between these entities It also defines the 
generic mode! of two-party call processing iLinctmnaHtv, the Basic Call 
State Model fBCSM} CSl limits end-user access to service processing 
capabilities to the following types: analog lines. ISDN basic and primary 
rate interface (BRl/PRIi, and analog and SS7 trunks. 

The target set of services for CSl includes universal personal telecom- 
munication (UFT), freepfione. virtual private network IVPN), credit card 
calling, user-defined routing, and others. All of these services are consid- 
ered immediately marketable and highly profitable. The common charac- 
teristic of all CSl services is that they apply only to one party of the call 
(either the ortginating or the tenninating partyl, and generally only during 
the calt setup phase 



The protocol used by the different CSl fumitfonat entities to commyni- 
cate IS called the Intelligent Network Apphcafion Protocol (INAP) This 
protocol relies m extstmg underiymg transport proiocots (eg, SS7/TCAP) 
to convey the inteihgent oetVkflDr"- "" "" — '^ ' " ' ——- 

Cap3bilttyS8t2fCS2} 

CS2 IS the second standardization stage and is a superset of the CSl 
recommendaiions- CS2 aims to support enhanced services in addition 
to the ones supported by CSl I' lev/ capabilities that allow 

handling of multiple parties tha • be rnvoived in the same call, 

such as conference calling Other capaoiiities will he included m CS2 to 
suppoa personal mobiltty (UPT) and tenninal mobility (DECT GSM) func- 
tionality These new enhancements and capabilities are achievable by 
extending the existing CS1 call -processing model and fundi ona^ model. 
INAP operations are also extended and new ones are to be defined 
Standardization activities are going on at ITU-T and ETSl (European Tele- 
communications Standards Institute). A complete revision of the CS2 
protocol IS expected at the end of 1997 

The target set of services for CS2 includes call completion to busy sub- 
scriber, conference calling, cati transfer, call vvaiting. mobility services 
lUPT GSIVIl, and others The common characteristic of all these services 
is that they require call party handling functionality that is not supported 

inCST 

Future Capability Sets 

CSl and CS2 do not cover all possible user accesses and network capa^ 
bitities. According to the phased approach. ITU-T plans to introduce CS3 
(and maybe others later) to cover broadband network aspects hniell igent 
network/B^ISDN integration), intelltgent network/TMN integratiari, and 
full support of mobile communications systems. Requirements are being 
set up and specifications might come in 1998, 

HP Approacli 

Becaiist^ thu needs, in terms of network operation, van/ from orse network 
operator to another (operator-specific charging and billing, implementa* 

hon limitations], and because the INAP standard will continue to evolve 
following tha different capability sets, network eguipmeni providers have 
10 work with a large number of INAP variants. 

To meet this need and to be able to respond to its customers' require - 
merits rapidly, HP has developed a flexible service execution platform 
(see the accompanying article) that is able 10 rapidly follow the evolution 
of the I MAP and support different customers' specific variants. Toots are 
provided to automate the support of a message set's syntax The imple- 
mentation of the message set's semantics has been pushed to the appli- 
cation level, leaving the platform itself independent of any supported 
message set. This has the benefit that a single platform can be main- 
tamed for a varied and evolving customer base This independence from 
the INAP message set also allows the HP platform to easily support 
simifar message sets defined by other standards bodies such as MAP 
(defined by ETSl for mobile networks) and AIN 1 and 0,2 (defined by 
BellCorel 



f32 ms. The bcfiavioi iimsf br csjiiinjllahlp so ihal ihc sy sit-in 
is (iotcrministic. Trnnsaciioii \i\tvs ofiif) fn lOjKSOiransjK' 
tions fii-r s<'r(Hi(l fur lU'twtnk t'lriru'irls luivv* \tvvn Yvq\ws\vi\. 

High Availability. Nrtwujk I'hMiK^nls Hiu*h iis sciTitci t oninil 
point^siuid hoMK' location tt^^islors aro t lilit al cojTifMmeiits 



ill iu\ isnt*lli.i4rni ru'twurk. Vor>^ hi^h syslt^in availability is 
n^i|iiin'(|: ltd iiuiii* ihaii Ihn^' niiniili's |<Mal dnwiilinu' [K*r 
yean inrlialing liotli srlii'thikMl anil uiistHiediilerl iknviitiintv 
This necc'ssilafes liiglily reliable hardware with ik> single* 
point i^rfMilnn- and snflwrtip tlial allows inatril atipliralions 
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to back eacft other up in the evi&nt of a natural disaster 
tUsabling one particular site. 

Furthermore, in the event of a failure at a site, during the 
failure recovery phase, the network element: must be re- 
sponsive to other network elements, taking iess than six 
seconds to resume service. If not, the network considers 
that a total failure has occurred at the site. 

The availability requirements also pervade the scalability 
and functional evolution aspects. The system must be capa- 
ble of expansion and addition of new functionality without 
disniption of ser\'ice. 

Similar kinds of requirements apply to service management 
systems, albeit not as severe as for network elements. Ser- 
viee management systems are typically allowed 30 minutes 
of total downtime per year. 

Scalability. A network element must be scalable in terms of 
processing power, memory, persistent data storage, and 
communications without compromising system availability. 
On the hardware side, this means support for online up- 
grades of processor boards, memory boards, disk subsys- 
tems, communication controllers, and links. It nuist be 
possible to perform such operations without senice inter- 
ruption. On the software side, it means bringing additional 
resoinces into sendee smoothly and safely. If anything goes 
wrong, the softw^are must automatically fall back to the last 
operational configuration. 

In general, network elements such as service control points 
and home location registers are classified in terms of trans- 
actions per second (TPS) and customer database size. Scal- 
ability then translates into the abihty to add new hardware 
and/or software elements to increase eitiier the maximiun 
supported TFS rate or the maximum customer database size. 

Fuifctionat Evolution. The abihty to add new applications or 
platform capabihties or simply upgrade existing ones with- 
out impacting the avadabihty of the system is vital. This 
means that such things as updating the operating sy stent, 
upgraduig the application, adding a new communication 
protocol stack, or changing the firmware on a communica- 
tion controller must be achieved without disrupting real- 
time performance or system availability. Tliis ability should 
not impose too many constraints on software development 
and installation. In aU upgrade situations, a fallback must be 
possible. 

Manageability and Operation Requirements. There are detailed 
and rigorous requirements concerning ii\stallabilit>^ physical 
chaiact eristics, safety, electiomagnetie and eiectilcal en\1- 
ronmentSn maintenance, rehability, and so on. Remote man- 
agemeni interfaces to large and complex netw^ork systems 
with demandutg performance and scalability requirements 
are needed. 

To allow easy management, complex distributed or rephca- 
ted network elements must be capable of providing a single- 
system \lew to operations centers. At the saiite time, it is 
also ver>^ important to provide per-system \iews, to allow^ 
management of specific systems ai\d to act upon them (typi- 
cally for fault management or performance management), 
Newiy installed network elements often need to be inte- 
grated into existing management systems. 



HP OpenCall Platforms 



HP OpenCall Service Execution Platform 

The HP OpenCall service execution platform is an open, 
scalable, programmable, highly available, and easUy manage- 
able platfonri. It is implemented as a layer of functionahty on 
top of the HP OpenCall SS7 platfomi (see j;uticle, page 58). 
Given the general network element requirements hsted 
above, the following architectural principles and high-level 
design decisions were adopted when developing the IIP 
OpenCall service execution platform. 

The software runs on the HP-UX* operating system, allowing 
it to benofit immediately from advances in hardware speed 
and CPU processLng powder 

All critical hardware and software components are repli- 
cated, giving the platform the ability to tolerate any single 
failure. An active/standby replication model w'as chosen at 
the software level, with an instance of the platfonn software 
numing on two independent systems. Besides providing a 
high degree of fault tolerance, such an approach also pro- 
vides the basis for most online upgradability tasks, such as 
upgrading hardw^are and the operatuig systen^ 

Rg. 3 shows a typical site configuration, with two instances 
of the HP OpenCall service execution platform software 
executing on two independent HP 9000 machines, but with 
a single connection to the SS7 network. Tlus site will appear 
as a single network node to other elements of the SS7 net- 
work. The conBgmation in Fig. 3 is a standard duplex con- 
figuration of the platform. Other configurations are possible, 
such as the mated-pair configuration^ described later. In the 
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Fig. 3. Duplex conriguratioii of itie HP OpenCall semce execution 
platrorm. TUo iustances of the platfoniv execute on two uidepen- 
deni tip 9000 host iTLachines with a single connecdan to the SS7 
network. 
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standard dyplex configuration, the two niarliines are con- 
oecieti \ ia a cluaJ LAS and both hosts are coiinected to a set 
of signaling interface units. The PIP OpenCall service execu- 
tion plat fcj nil software runs on botli hosts in active ajid 
standb>' mcKlt^. The active host controls tiie signalling infer- 
face units and responcte to all incoming requests from the 
SS7 network Both hosts are capable of being active (i.e., 
there is no defauh active host). 

The platibrm is network indepeJideni. It makes no assumj^ 
tion about the structure of the SS7 neiv^tjrk- It has tt\e ai>iliiy 
to support tjiukiple message sets, bu[ all message set depen- 
dent decisions are made at the application tevel. 

Some ntstomization is nece^aiy before an instance of tlie 
HP OpenC'aD service execiitioit platfomi software can be 
installeri in or connected to an SS7 network. By default, the 
platfomi supports no message set. and it offers no service 
to other network elements. Minimally. iLsers of the platform 
nmst install one or more niessage sets and pro\1de one or 
uiore ai)plications to respond to requests coming from other 
network elements or to s<*nd requests to otlier network ele- 
ments. Fiaiiicnuurc, to ensure dial the resulting solution 
can be n^onitored and miuiaged, it should be mtegrated into 
an existing management system or a set of management 
tools should be imp!ementetl. 

A set of APIs (application prograniniing interfaces) are pro- 
vided to access platform functionality, to be used either 
locany or remotely. Tliis pro\ides flexibihty with respect 
to iniegratioEi with operations support systems attd legacy 
managertient systems. A set of management tools using 
these .APIs are also provided, and vim be used to provide 
a lirsi level of platform inanagemcnt. P'urt her levels of man- 
agement (for niaiuiging lire plal form, installed services, cus- 
tomer data, etc.) em\ be [jrovicled by integrating the plaltbnn 
with other extenuij numagemenl systems via the provided 
APIs. 

Appliiations, or srmrrs. executing on die platforni me inter- 
preted, Nevv services or new versions of existing seivu es caii 
be installed at nm time without interrupting processing of 
TCAP (Transaction Capabilities Application Part.) traffic and 
without affecting ol her running services. BecaiLse services 
execut*^ iu a virtual machine with mi direct access to the 
operating system, services cauEiot affect the availability of 
the platfotm. Furthennore, the servit:e execution environ- 
Jiienl. ctiii monitor st^rvice instances, ensuring that instances 
do not interfere and that resources are not permanently 
consuitied. 

Serv ices ;u"e inciefH^ndenl of tJie opemting sysieni. j)rotecling 
them fnnu clumges ami upgrades tr> ojieniTing sysleiri. No 
knowledge* of the opc*rating systeni is required to write tlie 
seivice. Services me written in SLEL (Service Logic Kxecu- 
tioii l^mguage). Most of the basic concepts in SLEL bavi* 
been adapteil from HDL fSpeciflcatirm and l)escri]Hioti Lmi- 
guage), enhancing it v\ith some features specific to inlelli- 
gent networks. This has the advantage that SDL is well- 
known iu the lelecomnuuiications indu.slry. m\(\ nvAiiy 
teleconmumicaljons standards :ue spccille*! jji ,SDL 

A repli<^ated iii-memory relational database is provided as 
pan of die service ext^cution enviroimient. The stnicture 
and cunteiits cjf litis database are under Ihe user's control, 
liy litthlitig ixll customer-related data in RAM. services can 



respect I he re^-time response time reqtidrements imposed 
by switches, since there ts no disk access to retrieve call- 
related data To achieve data [jersistency, a copy of the data- 
base ts maintained on a standby host. 

A ^lanagement Infonnalion Base t MIB) collects infonnaiion 
on the stale of die platform, making this information avail- 
abie both via an API and via a set of nianagenu^ni tools, and 
allows external applications to manage and configure the 
platfomt. All management operations are directed to the 
acrtve systent — the standby s>^lem replays all management 
conunancb — tlius presenting a single^^em view to external 
a|>pliiations. 

The platform is implemented as a set of UNIX ^ operating 
system processes, allowing it to profit from multiprof^essor 
hardware. 

HP OpenCall Service Creation Environment 

The HP OpenCall service creatiDn environment allows easy 
defmition. validation, imtl testing of services. Services are 
defmed as fmite-state mac bines, using a graplucal language. 
The service creation enviromiu^nt [irovides a set tif tools to 
aUow the validation and simulation of senices l>efore tJiey 
are deployed on the HP CJpenCall service execution plaffonn. 
Tlie same senice execution enviroimient as described above 
exists in the service creation envlionmeni: to ensure that tlie 
same behavior is observ'cd when the service is installed in 
Ihe HP OpenC^ill service* execution platform. 

HP OpenCall Service Management Platform 

The IIP l)pt*n( all service management platfonu is capable 
of managing multiple HP OpenCall service execution plat- 
form sites. Sir h a cHstributed configuration introduces im 
extra degree of scalability, both i^n tcMus of tnuisaction 
throughput capacity and database edacity. 

Platform Design 

This section discusses five key aspects of the HP OpenCall 
.service extHunon platform solution: the service execuiiun 
enviixHum»nt, j^laifomi fiexibihty, lugh availability, database 
replicaticnv antl scalability. 

Service Execution Envirorunent 

Tlie HI* OpenC all senU c cxtHHiiion platfonn provides mi 
executi<:)ii envirotunent for telecommunications services. 
These servicer are tisuiilly develot>ed in response 1o new 
telecormmmicadons retiuiremejits, iuid typically prt.nide 
additional fimctionality to other elements hi the SS7 
network 

Programs defmed in the Service Logic Execution Language 
(SLEL) are inteiijn4ed by the platform at run time, and can 
use Itmguage primitives to access the nuuiionality of the 
underlying [jlatfonn. The primi!iv-es enable lliem to send imd 
rec*eive TCAP messages, rea<l luul wnte nu^ssage attributes, 
access and update the in-memoiy database, conunujiicate 
over other external connections, send and receive events, 
set and reset timeis, jumiage shared resources, log data to 
disk, and perform other functiojis. 

Prognuns writteti In SLEL define fitiite-state tnachines, that 
is, a service waits iji ;uty giveji si^in- imiil a recognizable 
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Service Lof ic Execulion Language 
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TCAP - TransAclion Capabilities Applicatton Part 

Fig, 4, Serv1e(?.s art* written as finite-state niachmes in the Service 
Logic Execution Langtjage (SIjEL) and run on the SLEL virtual 
ma chine 1 which pro\1des the hinctionality shrA\TL here. 

input stimulates a iransition to the next state. During the 
traitsition the service caii cany out processing and output 
messages. Fig, 4 suiiin^arizes tlie functionality of the SLEL 
virtual niachuie. 

Tlie following principles have been adopted in the develop- 
ment of ihe HP CJpei^Call semce execution en\ironment: 

• Services are not aware of Uie operatmg system. Tliey are 
isolated from the HP-UX interface. This allows easy migra- 
tion l)etween HP-tJX versions. Fuithennore, Oie application 
developer only r\eeds to provide service-spec ifit^ logic aiid 
need not be concerned with die staitup, switchover, and 
failiLre recovery aspects of tlie platform, since these are 
transparent to the service logic. 

• Services aie not aware of replication. Replication of services 
and restart of .semces ai^er a failure ai'e hatidled automati- 
caiiy by Uie platform. No failure recovery code needs to tie 
writteti by the service developer. Tiie service programmer 
can assmne a single-system \icw. To maintain llus illttslon, 
services can only access the local MIB J:Ur1 c;m <jiily receive 
locLdly generated events. t\mhemiore. llie service execution 
environment on the standby node is an exact replica of tlie 
active node's, providing the same services, same resources, 
same MIB stmcture, and so on. 

• Real-time response to switches. The service execution 
enviromnent gives the liighest yjriority to tlte processing 
of TCAF* messages and other service -related events (e.g., 
popped timers, received events). i\ll other activities such as 
database or MIB accesses by external applicatioiLs jue rtm as 
background tasks. Service execution caimot be interrupted. 
A single state transition runs to completion. All local access 
by a service is synchronous (even to the database). A service 
only blocks if it exj^iicitly reqtiests blocking, for example to 
wait for the next TCAP message, wait for a timer; or wait for 
an event. 

• Services cannot crash the platform. The service execution 
environment presents a virtttaJ machine as its upper inter- 
face. Services can only access the platform run<1 tonality 
from SLEL. No pohiter manipulation is avadalxle. Resource 
allocation ajid deallocation are done autornaticaUy by the 
platform. There is no possibility for core dumps or memory 
leaks* 



* OtiMne upgradability of services is possible. Becattse services 
are inten^reted, services can be enabled, disabled, installed, 
or removed at nni time without stopping the platform. Mtil- 
tiple versions of a service can be installed simultaneously, 
although only one can be enabled at any time. Instances of 
the previously enabled version are allowed to run to com- 
pletion j so til at TCAP traffic is not interrupted. 

* The platform manages and monitons service instances. A 
single service cannot block other services from executing, 
'niere is a limit on the number of instructions diat can be 
executed before a service is terminated. This prevents udV 
nite loops and prevents one service from blocking out all 
other services. Resources are automatical iy reclaimed once 
the service instance has completed. Tliere is also a limit on 
the total lifetime of a service, as a way of garbage collecting 
unwanted service instances. Both limits can be set on a per- 
servlce basis, and can be filtered at mn time. 

Platform Flexibility 

There is a strong requirement for the HP OpenCall service 
execution platform to be flexible. Obviously, it should make 
as few tisstmiptions as jjossible about the applications that 
are installed and executhig on it, and it should be able to 
integrate into any t^entral office environment, both in terms 
of its connection with the SS7 network and its links with 
management applications. 

Multiple Message Sets. The platfonn by defaidt supports both 
a TCAP/SS? ct)nnection and a TCP/IP cormection. It makes 
no assumption about the protocols that are used above 
these well-standai'dized layers. In practice, tu^y ttiessage set 
defined in ASNM (Abstract S>iitax Notation One) can be 
loaded into the platfonn, and multiple message sets can be 
supported. Tlie platfonn can enco^le tOid decode messages 
belonging to one of the installed message sets. 

The message set customization tools tiike as input ;iri anno- 
tated ASN. 1 definition of a message set. The output is a 
message set infoiinatiott base, which contains a c:oncise 
definition of t!ie message seh Multiple message set defuii- 
tions can be st tired in a single message set infotmation base. 
The HP Ot>en( all service execution platfonn s service exe- 
cution environment loads the message set definitions from 
the message set information base at startup time. Services 
rimning in the execution environment can retjuest the plat- 
forms encode/decode engine to process any incoming or 
outgoing message with respect to tlie installed message sets. 

The same message set infonnation base can be loaded into 
the HP OpenCall service creation environmenL The message 
set definitions ai'e av^ailable to service deyeh^jpei^ ;is pait of 
the help faciMtv', The message set inf-sjnivat ion liase is idso 
available in tlie validation environutent, iUlowing the user to 
submit mes^sage-set-specific message sequences to test the 
logic flow of the developed services. The t.rafSc simulation 
tool tises tiie message set information base to encode iind 
decode the user-supplied message sequences. 

Flexible Sen/ice Instantiation. When a new TCAP tTansaJt*tion 
reqtiest is received by the platform, service instances must 
be created and executed to process the message. To offer a 
lugh degree of flexibility, the policy for choosing the correct 
service instances is programmable* that is, a user-nupplied 
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pitH'e cif service logic is execuied to choose the most appro- 
(iriaie senice. T!ie dt^cision caji be basc*d on any crilerion, 
such 3s [priority of the n^tiiit^l, cusionier-sj>ecific data lield 
in the ciatiilmse. overload status of the plalfomi, and so on. 

lb allow tracking of resources, only one seniee instance 
conlrob Ihe TC*AP transiictJon, although it c^an \>t^ passes! 
lieiwtvn insi ancles. In this way, if the instance tliai ciim*nUy 
cuntn>h a ininsaction exits miex|>€Klt*dly, the platfonti mn 
ck>se ihe assocnatcnl traiisaiiions and free the associate<l 
resources. 

Flexible Service Structure. Tlte inteiligent network marker hm 
WniVitHmaUy taken a sen ice inde|)en<ieni building binek- 
ba^ed apt j roach to seF\ic*e iinplenieiitation. That is, a ijet 
of <»et%1ee indejiendent building blocks aie pro^iiled by the 
uncierlyint^ exec ution platfi>nii, allowing applicarious to 
merely link these building blocks togetlicr to provide ser- 
%ices to the SS7 nctA^ork. The disadvantage oJ'tJtis approach 
b tliat the only fuitclionality available to progranuners is the 
set of available ser^ici'^ mdepcjuk^nt huildiiii* blocks. Tliese 
are often message-set -specific and difOcult lacustoinize. 

The liPOpenCall senice execution platfonu does not pro- 
vide a (U^lauil set nf semce iudependeni Inukiinf^ bkjcks. 
Instead, it provides the means to sUiictme apjjikati4>ns as a 
set of eouiponents. Thus, if re^luired^ a us4t can implement 
the ITl -defined si4 cjf standaid senlce indi^peudenl buildhig 
lii(j<'k^i and then use thtise components U\ implemeni appli- 
caiiuns lo provide higher-level senices. Of course, thi* tiser 
can also tiecidc to iiupiemenl: an entirely diffcirnt set of 
se ivic e tndepetid eiit f >u i 1 d i n g b 1 oc k s . 

Kurt hern lore, die platfoiin does not distinguish between a 
stnvici' in(lr|Kiuient buildin;^ block (or component) and an 
a|)pfication. Instead, it views l>oth as seiviccs. Both cmi lie 
art >it rarity rom|>lex pieces of lo^jir. detlned and inslaltrd by 
tile user. Mow tht^y irUeract and what luncticjnality dii*y i)ro- 
vjtle aj'e entirely under Ihe user's control To pnnide a single 
setviee (o the SS7 netw< uk mif^bt only involve a single in- 
stance of st^[\"itT^ Ini^ic, or it uiij^il involve tite creation arid 
interworking of uiult iple such instances. 

Platform Management. Tin* (jtattonu cxjmuIs int'ormation on 
its stak' via a \huiagenu*ut hifmiuation B<Lse (.^IIB). The 
MIB t an be used to botli monitor anti control tlie platform. 
For(^xampk\ installing a new^ seivice or a new vt^rsicm of iui 
existing service onto du^ plallVum is performed via tlu' Mill 
SimiifUly, adding a new^ TClViB eonncetitm or siipfiorting a 
nc^vv subsystem number is also achieved \4a tlie MIB. Botli 
cases involve ereafing a new^ object in the* MIB. Statistics <in 
CPl- use, TCAF iraffic. senice e.xecutitm, daial>^Lse nu^nioiT 
use, and so on are all held in IIk^ MlBantl vim bv retrieved tiy 
external appiications by issuing a request to the appropnate 
objects. 

The infonnation ispreset^ted as a hierarchy (jf objects* wjdi 
each objei't retaescntinga [)aii of tlie platforms rurvitionalhy. 
Tlu' Itierarchital structure provides an inluilive naming 
sf Iveme, tUitl this also allows easy integration iiUo stimdard 
CMIS (C*oniPiott Matiagemeni Information Service) or SNMI' 
fSimplt* Network Management Protocol) management 
frariit work.s. I'sers can rreatc new objects, delili: existing 
objei ts (obviously ihangirtg die lunctionality ofdu* jjlatform 
III the process J, tipdatc existing objects, or just retrieve^ in- 
fr>rrnn1ion from individual objetis. As mentioni*d pre\iotisly, 



APIs that can be used remotely are provided, along with a 
stn of management tools that provide a first level of plat- 
form management. 

Customized Overload Policy, t Jne of the most important resi>on- 
sibililies of iuiy SS7 netv^ork element is to iwsjx^nd in a timely 
nuuiner to iin^ntiiig rtH.|ut^is. Tlie t\'siM>nse time on n^nests 
niitst ai>|>t*«ir to be bounded, and ^HJ'Hi of replies must bi» gen- 
eral ed within a fixe*! time iiuenal. This unplies that the net- 
work element nuist react to overload situatu>ns. 

With the HP tJpetit*all senice execution platform, the over* 
load policy isi>nigranvunil>le. Thai is. userndenned logic 
sj>e4^il1es how the network element reacts when die load is 
lijglt The j)Iaift*rni itself collects statistics on a ninnber of 
Qverioaf! inthcaiors such 3S CPl' use, transaction rale, nimi- 
her of mijirocessed mes.sages. and average queuing time for 
such messages. These \'ahies are available in the MIB mul 
can be viewed both by tlie o\'erload senice (die logic imple- 
[iiendng the overload policy) mid by extemal nianagenient 
applications. 

Tlie jircigninimability of tiie oxerload policy offers a lugli 
level of flexithtily. F'or exjuiijjle, njtder liea\y load, the over- 
load sen ice may decitle ur. 

* Kejeci lu^w incoming requests bm continue to accept 
messages relating to ongoing transaedons. 

' Reqnesi that other net work i^lenients reduce the munber of 
recinests. Ol;viously such a pt^licy is uetvvtak-sjiL^citlc and 
messagc^-set-s|jec ific, requiruig knowledge of both the SS7 
i^etwork configuration mid the message sets suppoiled by 
remote switches. 

» Rtjctt new requests that require c onqTlex t^n*<'t^'^^irig 
(oiniously apjHicatiou siieciric), or rejt^cl rciiuesls tor low- 
vi ^venue-generatiitg services (again, api>li("at icHt-specilie), 

The platform als^j jjrovides a set of hooks to control the Iraf- 
fic* fiow: The overload jiolicy can. for exam] )!tv. rec|ucsi die 
platform to reject new treuisacticm iitHjiu'sts f ijt*,, only acec*pt 
mcssagt^s relating to ongoing irausactions), to limit Ilu' ti um- 
ber of iusianct^s of a given seiA ice, or ttj rejett irafilc from a 
particular remcitc^ net wi irk eliMurut. 

High Availability 

All critical compcHienls of the HP OpenCiJl senice execution 
J flat form are replicated (see Fig. o). The core set of softwaie 
processes operate in active/standby modc\ This fonns the 
basis hotli of the filath^rm fault toleraure and of its online 
tiiigradability [jolicy. 11 ust*s tlu' 1 IP t )pen( 'ail SS7 liigli avail- 
ability plathaiu, with even critical prtjcess being a client of 
the fault toleranc^e t*ontroller (S4*e ai1ick\ pagc^ ii^y}. For sim- 
plicity, not all of the pracesses are stiowTi. and not all of 1 lie 
inteiprocess links aie shown. 

Besides ifqilicaiion of proct'ssi's, Ifu' plattoiiu also uses 
mirron^d disks, dnptieaU^ sigualiug inlerfat'e units, ami dual 
l^^N connections. 

Tlic prhifiples that fc)rm th(^ liasiw for Ihe high availability 
policy are fllsrnssr'd In the follcjw uig paragraphs. 

Active/Standby Model. An instance of tiie HP Ut><^n( all sri^ice 
execution platform platlbmi nms on ea(*h of two tnciepen- 
deni machines. Oiu^ instan(*e, the mtiv(\ is responsible Ibr 
responding eo all i neon dug rcHiuests, wtietber tVoni the 
SS7 ti<^iu(M k or irorti management applicaiioiis Tiic otiier 
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SLEE = Service Lagic Ixecirtian Environnieiil 

instance, the standby, maintaijis a copy of the active's state, 
always ready to take over processing of such requests. The 
decision to adopt an acti%'eystaiYdby model brings the follow- 
ing benefits: 
» The code is sinipier, resulthig in fewer run-time errors, 

* It is easy to provide a single-system \iew ( tliat iSt the active 
instance defines the state of the platfonn). 

* It isolates errors because die two hosts are not performing 
the same tasks or the same types of tasks. 

* The standby host is available for online upgrades, configura- 
tion changes, and so on. 

The alternative, to adopt a load-sharing model (with requests 
being processed in parallel on two or more hosts), would 
have required more complex communication protocols be- 
tween the instances (to ensure synchronizat ion and a 
single-system view ) as weil as a greater possibility of aiv 
error simultaneously impacting more than one host. 

High Service AvailabilJtv, The solution gitarantees high service 
availabihty to the SS7 network in the event of any single hard- 
ware or software failure. To achieve this, the standby host 
must always be ready to take over in the event of a failure 
on tJie active host. For Uiis reason, the standby maintains a 
copy of the in-men>oiy database and of the service execution 
enviionment. All relevant changes to the state of the active 
(e.g., changes to the active database) are immediately prop- 
agateti to the standby. In the event of a s^^dtchover, the state 
of the standby tnstajK-e defines the current state. Tliat is, the 
standby does not need lo retrieve information from either 
the failed instance or from other external appUcaiions to 
become acti%^e. Thus, the transition to the acti\'e state is 
uistantaneous, guaranteeing higti service avaHai^ilitj^ 

Centralized Fault Recovefy Decisions. All switchover and pro- 
cess restait decisions are made by a centralized process. 
thefauit tolenwce controller, on each node. These two pro- 
cesses continuously exchange information on the state of the 
tw^o hosts. AH other processes obey the centralized decision- 



Fig. 5. Replica don of critical 
components in the HP OpenQdl 
service execution platfonTi 

maker. This greatly simplifjes failure recovery and error 
handling. In (he event thai both LANs go down, the signaling 
interface units are used as a tiebreaker The node that con- 
trols the signaling interface units remains active (see article, 
page 65}. 

Online Upgradability and Maintenance. Because all of the criti- 
cal components tire replit ated, ihe existence of the standby 
tiost is used as a basis for all online upgradabihly (operations, 
such as changing the operating system, installing new ver- 
sions of the platform, reconfiguring sendees or the semce 
execution eniironment, performing rollback operations on 
the database, and so on. Because most upgrade operations 
can effectively be performed online in this way, the platform 
meets its downl ime-iier-year requirements. 

Replication Does not Impact Services. Service progranmiers do 
not need to be aw^ire of the replication of the platform. Fur- 
thermore, propagation of data to the standby is perfomied as 
a background task, implyhig a minimal impact on response 
time. .Algorithms for resyncfu'onizmg the standby after a 
failure are also run in the background. 

Respawnehilitv- It is unportant that a failed host or a failed 
process be restarted quickly. If the standby host is not avail- 
able, this introduces a window of vulnerability during which 
a second failure could cause the whole network element to 
be una^^iilable, directly imparting service availability. This 
respawnabUity feature is possible because of the contitnious 
availability of an active system. The failed instance will 
rest art j rebuilding itself as a copy of its peer Because the 
active instance is assumetl to be consistent, the respawned 
instance is also guaranteed lo be consistent. However, to 
avoid rebuilding an iastance to a nonconsistent state, in- 
stances aie not respawr^ed if the peer is nor active ( wliich 
might liappen in the rare case of a double failure). In such 
cases, manual intervention is required to restart the com- 
plete platform. This respawTi policy ensures rapid failure 
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recover>^ in the case of a single failure but prevents erro- 
neoiLs failure reroven' in the case of a double failure. 

Support for Complete Restart Airhough the piatform is de- 
signed IQ tolerate any single failure, in certain cases it may 
be necessary lo stop and restart both hosts (either because 
of a double failiye or for operational reasons). Diskbased 
copies of both the ^^B and the in-niemory database are 
maintained, and these can be used to rebuild both hosts. 
Howev^erj some data loss niay occur. This is considered to 
be acceptable, since double failures are rare. 

Database Replication 

At the core of the service execution environment is an in- 
memory replicated database. The database is in memor>^ 
to guarantee rapid access and update times. To ensure high 
availability, copies of the in-memor>' database are kept on 
both the active host and the standby host. Critical updates 
to the active database are propagated to the standby 

The structure and contents of this database are under the 
users control. W'hen defining the daUibase structure, the 
user nuisi also define the replication policy for ever>* fieki 
Of course, propagating updates to the standby will impact 
performance, but because the user is allowed to specify the 
replication policy the user can aJso control the impact. 

TratUlLonally, database systems achicv^e persist en ry by main- 
taining a copy of the database on disk or on mirrored disks. 
In the HP Openfall service execution platform, the primar>^ 
standby copy is held in memory on a stiundby host. This 
offers a number of advantages over the traditional approach: 

• Writing to a remote iivnieniory copy is quicker than logging 
to cUsk, anfi therefore has a snialler impact on SS7 traffic. 

• The degree of availability' is liigher. hi the event of a failure 
on the active host, the staiidby copy is Lmmediately avail- 
able. It Ls not necessary to recreate a copy of the database. 

• The standby copy can be taken offline and iLsed to restruc- 
ture ihe database or to roll back to a previous version of the 
database. 

Periodically the active host generates a disk-based copy of 
tlie database. This vheckpoinl of the database serves a 
numbt^r of pun>oses: 

• The checkpoint ensiu^s that the platform can recover &om 
double failures- 

• The checkpoint is used to reinitialize a standby host if and 
when it is restarted after a failure or shutdown. 

• The checkpoint can be used for auditing purposes by 
external database management systems. 

Three algorithms are critical to the database replication 
scheme: Ihe data rephcation algorithm^ the resynchroniza- 
tion algorithm, and the rollback/restore algorithm. 

Data Replication Algorithm* As mentioned above, the userspec- 
iiles die replication policy for every field in the database. 
For certain data in tlie database, it may not be necessary to 
rep lit .ate changes to the standby host. I^P^c'^ examples are 
c^oimters or flags used by services. 

Consider a set of fields that collect statistics on SS7 traffic. 
ITiese would typically be incremented every tinte that a new 
request is received^ and would be reset to default values 
periodically. For matiy services^ it may be acceptable not to 
propagatje these trafilc statistics to the standby database. TliLs 



implies that some data mil be lost In the event of a failtire 
on the aeth'e host (all traffic statistics since the last reset), 
but it also impli^ that less CPl' time is consumed rephcating 
this data to the standby. This trade-ofi'is apphcanon-specific, 
so the decision to replicate is left to the user 

Forfieltls that are marked for replication, the active database 
instance ivill generate an update record called an extemol 
update notification, containing both the old value and the 
new ^'slue. for eveiy update. This record is then propagated 
to the standby database instance. By comparing the old value 
with the new^ value, the standby database can verif>' that it is 
consistent with the active. In the case of an inconsistency, 
the standby database shuts itself down. It can then be 
restaned, rebmlding itself as a copy of the active database. 

The flow of an external update notification is shown Hg. 6. 

To handle double failures, the external update notification 
is also viiitten to disk on both the active and standby hosts. 
This is perfonned as a background task on tiie active host to 
minimize the impact on the transaction processmg acti\ity. 
The active host does not attenipt to maintain a replica copy 
of the database on disk. Instead, a log is maintained of all 
of the updates that have been performed. To rebuild the in- 
memory database, the log must be replayed. 

To manage the disk space required for these external update 
notification logs, the active host takes regular checkjKJints. 
Tins task is treated as a low-prioritj^ background activity, 
and the frequency of the checkpoints can be controlled by 
the user (obiiously as a hmction of the available disk space). 
When a checkpoint operation has completed, redundant 
external update notification log flies can be removed. By 
default, the active host keeps the two most recent check- 
point flies and intermediate external update notification log 
files on disk (following the principle of rephcating all critical 
components). 

Because database updates must be allowed during the check- 
pointing operation, each checkpoint file has an associated 
sequence of external update notifications that were generated 
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Fig. I>. Kjftr'rTiiil update notification flow, CO Update periornied 
Oil at live database. (2) Update record (external update notiflca- 
tion) serst to standby. (3) Exti^nial iijidate notihciition logged to 
disk on active ho^st., (4) Update ])t?rformed on standby database. 
(5) Exterrtat update nottncation logged lo disk on standby host. 
((V) AcknowJr'f J^nfiit st;nit to adlve iiost, (7) Eitternal update 
noUrtcation forwardnd in interested external applit^ations. 
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Fig. 7, f lif^rkjMiiMt and f^MiTii^il updjilf iioTiriailirxn files on disk. 

whdt' tht^ cliefk|>oi]i! waw Innng taken. To lel^uilil n tnpy <jf 
the iii-nieniorj^dalabLtsc, hulh the dipclqiomf fileiind t\w 
iinBiK • i at ed ext en Tat 1 1 f x iat (m 1 1 jti ti c- at i f ms ar (^ tc( | u i r er ! . Typi- 
tally. thi' contents of liic^ disk containing thviriv diittihase- 
ivlalpci t'ilrs will Im' as <te[ji(iiMl in Fij^. 7, The external up- 
dait^ I lot in rations are ytored in a sequence ol fite^. The rate 
at which owe fib is closed and a new file in the sequence is 
{ opened can he rontmhed by I he user (either as a Iiiik lion 
of time or file si^e, or tm demand). Thus, nuillii>h^ t^xlenial 
update notilltation lo<^ files ean t>e c resiled between two 
eheck])(jints, or intJeed dunti^a elieckpoijrl operaticjn. A 
copy of the hi-nien^ory database caii be rebuih eillier from a 
checkpoint fite and asscK iatefi externat update iioiificatlon 
log tiles (those files generated during tlie t lu^tkpoini ) or 
tVoni a elieek|>oint file, associated ext€nr\al upciate notifica- 
tion log files, antl snJjseciuent extem!:rl update notilication 
log files. To hold a c(jpy of tlie database on a roniole storage 
device, I he user should close the cunent external nrtddtc* 
noiiUcalion log File (the Mclive database will close I be cur- 
rent file, assign it a file name in the sequence, and optm a 
new ci me n t e xt e nial 1 1 1 > d a t e i\ oi i ficai i o n I og f i I e ) , an d t h e 
user should then <'npy to stable suiragt" the most recent 
checkpoint fik^ and \Uv external iipdaii^ notification log files 
pfenerated during iutd since that operation. 

Resynchronization Algorithm. P^ailiires will happen. Therefore, 
a failure rtH:*oveiy algorithm is required. One critical compo- 
nent Is the recovers.- *>f the standby copy of tlie in-memory 
database. The elieckp(Jint and external u])dali^ notification 
Uvg nies also play a criliea! rnh* in this algorithm. The algo- 
rithm is complex btK^ause uirdates arc peiniitletl on (he 
active database while itie standby database is being rebuilf . 
The resyncbronizalion process can take a long time fin the 
oteler of minutes), so it would not be accej)tal>le to tlisallow 
database updates tiuring thai period. 

Tlie sequence of steps in resynchronizadon is as fallows: 

1 The standby host co|iies the intjst recent datiibiise check- 
poiiU tile and external uinlate notification log files from the 
aciive file svsreni to its hjcal file s%'*itenr. 



2. The standby bosi rebuilds die in-meniory rot>y of the data- 
base IVoni this consistent set at files. 

3. When thLs operation is complete* the standby ciatabas^? 
asks the active database to close the current extenml 
up<tate notification hjg file. It then letrii^ves that file atid 
replays I be external update nolificution r<'t()rds. 

4. Step 3 is (hem rt^pealed. The assumption is tJiat, because 
t!ie standby host is not perff^nuing any otiu'r tusks, with 
ea< h iteration of slep 3. the size of Ibe cunent external up- 
date nnfification log file will be reduced. In effect, the stale 
of llu* slundljy database is moving cIosit lo diat of the active 
da( abase. Bveii(ually the size of (he rnneiU external upilate 
noiJficati<in log file wdl be small enough for the algorithm to 
move to the next step. 

r>. The stan^iby database again asks the active database to 
close the current external tip date notification h>g file. At this 
point, a connection is also established lietvveen the two 
database c< j \) i « \s. N ( > w, w b i I e re i h i ' v i u g ; 1 1 u 1 pro cessi n g r h e 
latest exlernal update notificalitm log file, the starulby data- 
base is also receiving new exlernal utxiate notifications via 
the socket cor\nection. 

(1 ( )nce the latt^st external ujjdate notification log file is 
processed, (he standby database stalls to replay the qu(Hieti 
external uprlate notifications. At this poinl, widi (he (^s(ab- 
lishmc^nt of the real-time connection betwc^cn die two da(a- 
liiise ciipies, the tw^o c-opies are considered to be synchro- 
nizcHl. mul tfie standby dalal>ase is c^onsidered to be hot 

RoMback/Restore Algorithm. Ah with all database systems, if Is 
also possihlt lo roll rhe in-ineni<jry database back to a pre- 
vious state. Agmn. the checkpoint and external update no(i- 
ficalion log files play a critical role in this algorithm. 

Tins opemt itni can be achieved "online"' by using the st;uull>y 
copy of the flat abase, which can be taken offline witiiout 
Impacting the active hr>sl. V\liile the latter conlinues to pro- 
cess TC'AP traffic, the rollhat k algtuithm ran hv ajipliefl to 
the staiuli>y database. In effect, it re builds itself from a check- 
pohit iWc and associated external update notiliealion log files. 
(Jnce this operation is comi>leted. the user can request a 
swhrhover of the two hosts. The previously offline standby 
host will now begin (o receive antl (ojnocess TC'AP traffic. 
The peer hosi should then be rebtiilt as a copy of this new 
active host, ttsing the synchronization algoi'ithrn described 
abtn-e. 

Scalability 

The HP OpenC'all senice execution platform is implemented 
as a set of processes iiuming on the 1IF-17X operathig syslem. 
Tlus givt^ it a degree of seal alii lit y. sin<*e it can nm on a range 
of HI* machines and can also benefit from muitiprfjccs.sfn' 
liardwai*', U>w-ertd configiuauons aie executed *jn a single- 
process(7r, 48-MHz machine with liSM bytes of RAJSb Tlie 
high^md configurations currently execute on a dualpjoces- 
son lH>MHz machine v^ith TtiSM byies of RAM. Tln^ niigration 
lo ITP-liX 10.20 hi creases the capacity of tlie platfonn both 
in teiTus of the niaximnm supported TPS rate anrl the maxi- 
mum databiise size. 
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An extra dpgrf^e of sralabilit>" is liio^idrcl wiUi the support 
of t ti a t('ii-p(nrcimfl^nn\i Urns. Fig. -l .slu>wt»d ihv Htandan! 
sijigk'-site finpit^x rotifigiimtirm, thai is, an iw1ivcvsian<Ihy 
ctMifigiiralitjn naiinjig at one geographical location with a 
single connection to the SST network ( multiple links are 
used foi tvtiiiiidancy and load shaiing, but a singh' a<ldn^SH 
is shiUi'd by the?>te hnks). A (fetributed sohitioiu with mull iple 
active/stainlby coafignrations running at a set of sites (each 
wMi its own SST adtlress) provitles a ttumlser of benefits: 

^ Extra TPS capacity, since sites c^an process Uafnc* in pimillel 

y Increased flatabiLse cajiacity 

I Toler;:UKe of site faihnes or outages, Tliis \b eritieal if and 
when it is necessary to shut down a conit>le!e ^ite for main- 
tenance or opeiTitional reasons (or in the tmiikely case of a 
dtjuble faikuc hitting t>ot!i iht^ active and statuHw hosts). 

Tlie HP OpenCall service management platJtjmi <'an t>e used 
to manage a distributed configuialion, as shtuMi in Pig. 8. 
Although this is referred to as a mateil-pair configiualton. if 
IS not limited to tw^o sites; multiple sites C!an be supported. 
F'urthtTinore, each she is not required ir) Itave (he same 
drihiba.se. titat is, the cmitenls of t tie databases at different 
sites can hv ccimpli'lely dilTen^iit, However, in gene rat. sites 
wih ht* ]jaired; hei\ce the luune niated-jMir For a given site, 
a copy of its datab^ise will also t>e rnainiainccl at one other 
geogni|>hicaliy remote site. This leinntt^ site I hen has thi* 
ability to fakc^ cner if t tie original site is .shut dfjwn. The n>le 
of the HP U|ien('ah stn-vice nianagemenl platforn^ is to main- 
lain c-onsistency across these nuilli|>le sites. 

The etnin^ahj^ed IIP OpeiiCall service management platform 
mainlains a disk-based co^iy of each in-memory database, h 
receives aotiOcations (extenial update luniJl cat ions) finm 
eaeli site when the irt-meitioiy dataliase is changed. Such 
extt^mal update notiricadotts ai'e tlien ptopagated to all ottier 
sites coniaiuing the aJlered daia. 

If an operator or system administrator wishes li* change the 
contents of the in-memory tlatabase, the reciitest .shouUJ be 
sent to (he ser\itr management plalfornr 1( will ((k^h forward 
(he re<iuesl to all sites liolding a copy iH llu' altered daia. 

Sites will typically process TCAP I rat He in parailel, so ir is 
possible thai tin* same flata may t>e changed siinuhaueously 
a( two se|>arate sites. Bcjth will generate external updale 
notifications and l)oUt exlem^iJ update notif1cali*(iis will be 
|iropagaterl to I he sen it c nianagenieril platform Tlie |ilat- 
h)rni is lesponsilile for detet tin^.! this conllicl and hir ensur- 
ing I lie consLsiency tjf the re|ilicas, U uses the oh 1 value th^ld 



Active and Standfay 
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Execution PtatfooHs 



Fig, H. i >nt itilizett uianiAgeair'nt 
f of null I ipte to * i >|i(*ui;<il I semce 

expf'UTion pblfiimT judfs, 

of die external update nodHcalion ro detect thai simulta- 
neoiLs ufidates have* oct^urred. ( ■otisisreney is ensured by 
rftjeeting any external updan^ notincation in which the okl 
value does not matcti the eiuTent value tield by the service 
management platfonn, and by applying the curreni value at 
the she from wiiidi tlie erroneous external uptiate notifica- 
tion was receivcft. In effect, tins establishes the senice mati- 
agement plat form's 1 1 at abase as the master database. In I he 
event of a discrepant^y, the service managtunenl platform 
tonsures that its view is enforced and that all copies will 
become consist en I , 

lite HP (IpenCalJ senice management platfonii also sui)|ioits 
an atididng function. Tlie service mancigemeiil platform 
onietB a ciieckj>oint of Hie IIP Opeutall seivice execution 
platform's in-memoiy dalaljase. and then compares the con- 
tents of the resulting checkpoim and external update notifi- 
cation log files with the c rontenis refits own ilisk-lmsed daia- 
base. Disc reiiancies are repotletl fo i he system adminislraUir, 

Conclusion 

The inielligeiit hi^lwntk atcliitechne allows theteleeonimu- 
nicaliims induslty to m<nt* to a mon^ t^iu^n solution, in 
whith die creation, deiilo.vinent, anti modiricatltm cjf ser- 
vicers is indet>endent of a jiarticular network eqtiipment sup- 
plier and e(|ui|nneul from differcHit provideiscan interwork 
to jjrovide new and innovative ser\ices. Having indepi ndt^nt 
and tieutral intelligt^nt network plalfonu imniders can en- 
force irnplenu^ntatuai of stiunlards. iience ensuring better 
hiitTconnetiivity. 

Tlie IIP OpenCaU service execution platform is an open and 
flexiliie platfonn. It has t>een instalk^l in a largi' nninhei of 
different networks tlirougfiout the world, aiu! hasstiown its 
ability 1o interwork with iHjihpmtMit from a hcjs! orf>tlier 
network t*t|ui|>ment [>rovidei's. It lias l>een ust^d to im[>lemcnit 
mission-critical net work elements such as seivit^e CfUitrol 
points inid st^r%ice data funclitms using siandanL otTflie slielf 
computer teclmoh>g>. This use of stiindard hardware and a 
stiuidaici operating system will allow^ operators to btnieJIt 
from the t^sxjlution of infonnation tecimolr>g.y with few adtli- 
tionat <cjsts and limited engineering risk 

HP'tJX g/ anci in rj im HP MOP Series 700 and BOO computers are X/Open Company UNIK 93 
h! funded prodmits 

UNIX IS a fsgisiered frarffimafl^ in the United Stales ^rid nUrni umnitm, hcmmd ennlussvely 
Ehrrjugt" X/Opefi Company LimiiKl 

J^/Opftn Is a frnjismreti iradamarit and the X times ^s a iriadcwark of X/Opsn Company timiTed 
Ml ihE3 UK amt tJtiiQj countris5. 



Aii^tiNt mn lh'wlin-P;iLk:inlJi)timisl 57 



)Copr. 1949-1998 Hewlett-Packard Co. 



The HP OpenCall SS7 Platform 

The HP OpenCall SS7 platform allows users to build computer-based 
signaling applications connected to the SS7 signaling network. 

by Denis Pierrot and Jean-Pierre Allegre 



Today's telecommunications operators need to offer more 
aiitl more services to their customers. Because of deregula- 
tion and the resulting competition^ network operators have 
to be able to bring to the maJ ket useful value-added semces 
to differentiate tJiemselves from the competition. To support 
new functionalities, telecommunications networks have un- 
dergone an important restructuring starting in the ntitH980s. 
This restructuring resulted in the separation of the signaling 
functions from the voice transmission functions. 

Signaling includes all of the necessary procedures to set up, 
tear down, tmd control calls. Before this spht was made^ the 
networks were using inbatid signaling — ^the signaling informa- 
tion w^as conveyed over the same chaimel as the voice witli 
some predefined tones (see Rg. la). This technique had many 
drawbacks, including: 

• Long call setup times. Addressing information needed to be 
outpulsed one digit at a time for eacii intemiediate switch in 
the voice path. 

• Security problems. Billing fraud was possible by faking the 
inband signaling and billing tones. 

• Limitations on the amount of new services that could be 
pro\ided. 




User& 



Signaling Netvvprk 



Signaling 




User 6 



Fig. 1. (a) Before signaling wa.s sf^parati^d from voice trajistiilssiou, 
networks used inbartd signaling— the signaling information was 
conveyed over the same channel as the voice, (b) After the split, ail 
CJf the connectton setup, teardown, and control is effected vin the 
signaling network and the vnice trunks are dedicated to transporting 
voice only. 



The Signaling Network 

With ilie separation of signaling and voice transmission, 
the concept of tJie sigtmiing uptipork was introduced. The 
signaling netW'Ork Ls a digital , robust, packet network with 
built-in redundancy to achieve a high degree of availability^ 
Fig. lb shows the typical topology. All of the comiection 
setup, teardown, and control is effected via the signaling 
network and the voice trunks are dedicated to transporting 
voice only 

The creation of the signaling network, often called corwmon 
channel signaling or CCS, makes it possible to implement 
an important set of new services becaase of the global con- 
trol it provides over the transmission network. The current 
implementation of the signaling network is called Signaling 
System #7. or SS7. 

The signaling network is the foundation for the intetligent 
nelwork% which makes it possible to deliver new services to 
network operators' customers in a timely anti cost-effective 
manner. The intelligent network is programn^able so that new 
services cait be easily provisioned, It uses veiidor indepen- 
dent interfaces so that multi vendor networks can be built. It 
allows rapid introduction of new^ services, and it distributes 
the inteUigence in the network into a few^ mteUigent elements. 
For more uif ormation on intelligent networks, refer to the 
article on page 46. 

Elements of the Signaling Network 

Tlte signaling netw^ork is a packet netW'Ork built using the 
following elements: 

• The seixyice swUcJiiug point is a switch that is able to 
interact with the sigrtaling network, 

• The signaling transfer' point is a packet switch that routes 
messages betw^een end points of the SS7 netw^ork. Signaling 
tiansferpoirUs are often compared to IP routers, Signahug 
transfer points have no connections to voice tnmks or 
telephone Unes. 

• The sendee control point is the place of execution of value- 
added services. Historically, service control points were 
seen as databases only. The Adv anced Intelligent Netw^ork 
architectiu'e describes them more as the place of execution 
of the servic^e logic. 

• Signaling links are the physical connection between 
elements of the SS7 netw^ork. Tliey provide a full-duplex 
64-kbit/s digital path conforming to the V;35, DSOA, or Tl/El 
standards. A gioup of signaling links comiecting the same 
tw^o elements can be grouped logically into a linkset. The 
SS7 protocol provides procedures for redundancy and load 
sharing betw^een links of the same linkset. For example^ if 
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SSP = Service Switching Point 
STP = Service Transfer Point 
SCP ^ Service Central Poim 

Fig* 2. ElenH^iUs cif tjie sigriiilHig network. 



a link within a linkset fails, the pixitocol wiW automatically 
move the traffic to the nonaffected link and mil tr>' t<» 
restore the failed link. 

Fig. 2 shows typical elements of an SS7 network. The signal- 
ing transfer points tire pnni.sio ned in ]>aim. Senlre control 
points ajid service switclnng points always connect to two 
different nodes. Similarly, links are redundant such that 
messages can always find an alternate route in case of a 
failure. 

Use of the Signaling Network 

Tlie signaling tiet.work Ls used for many purposes: It is used 
txir reguhir chills, allowing rapid setup and secure operation. 
It is used in the mreline fixed netwtn k to provide additional 
services requiring specific service logic and tlatahases (80(1 
mmiliers, alternate billing, etc.). It is used in the mobile net- 
work ro manage mobility intV>nnation. For example, when 
a mobile phone is swiiclied on, the home location rerpster 
containing the subscriber prollle is queried using the signal- 
ing network. 

The signaling network is now the basic infrastrueture for 
the global (elecommunications network- SS7 networks are 
(ie[)lt>ycd in alioosl all countries now, with variable cover- 
age. North American networks aie delun/d by ANSI and 
Bellcore, while the rest of the world usually follows tlu^ 
ITU standard. The two flavoiTi of the standard ai'e similar, 
bill (orcoiuse!) incomJmtihh^ The ITU version is nsed at 
the boiindar> of the inttMiuilional networks. 

The SS7 Protocol Stack 

The SS7 reference model is based on the Open Systems 
Intercomiection (OSI) rcferen<"e model of the lntei*natiooal 
Organization for Standardization (ISO), following similar 
principles with layers of protocols. However, the SS7 model 
is more sj>ef iahiced, being designed for signaling infomiation 
tnmsfer with a sijecific focus on h>w latency rUid btiilt-in 
robustness. 

Fig, 3 shows the SS7 tn-otocol stark. MTP stands for Message 
Transfer Pan. It repit sents I hc^ three lower layei^ of the i)ro- 
rocol stac k. The Signaling C'onitection Control Part (SCCT) 
is built on top of MTP Ia^vcI a The ISDN I ser Part f ISl IPJ 
sit-i oil [oi> of MTP (and ])oteniially SC Ti* also). The Trans- 
action C'apabihties Apfilicaiitm Pari fTCAP) resides on top 
of SCCT. U'i'a look at each layt^r's ftiru'iifjns. 



MTP Level 1 . Tlie physical layer of the SS7 protocol is based 
on tilgiial transmission cliannels known as signaling links, 
connecting two digital elements verlh a rate of 56 kbits/s 
(ANSI) or t54 kbits/s (ITL'). The physical network can be 
composed of \':J5. DStLA, orTl/El links. 

MTP Level 2. MTP Le\el 2 maps onto layer 2 of the OSI seven- 
layer model and provides a basic message excliange with an 
error correction mechanism based on the retraJismission of 
unacknowledged mes.sages. An ahgnment i>rrK*edure ensures^ 
if successful, that links are able lo convey messages between 
tw^o points. 

l^nlike most lewl 2 protocols, MTP 2 has some unusnal fea- 
tures. F'or example, it keeps filling the available bandwidth 
by sending small messages I fill-in signahng units or FISU), 
especially when there is no iiser traffic. This allows it to 
protni^tly detect any physical link failure and to react 
accordingly. From an imi>lemejvtatlon point of \iew, this is a 
ver>^ stressful feature ;ind usually requires sijecific hardware 
and firmware, ^\nother interesting fearure of MTP 2 is its 
abihty to rettim unackriowledged frames stored in its buffers 
to the upper layer (.\fTP 3) in case of failure. This allows 
MTP 3 to retrieve the frames from a failed link and send 
them again on anot her luik without any data loss. 

MTP Level 3. MTP Level 3 iumdles the routing fui vet ions and 
network management pn.>cedures of the SS7 network. MTP 3 
is tlie key contributor to the built-in robustness of the SS7 
network. The network management functions are the most 
complex features of the SS7 protocol They are in charge 
of maintauiing the integrity of the signali!ig netwoik. Tliese 
functions can l>e s[ilit into three areas; link management, 
traffic managemern. ifmd route management. 

Link management is respoasible for the uttegrity of one link. 
It uses semces [esj)cciitlly coimters) ] provided by [V1TP 2 to 
monitor the quality of a link. If the link is eonsideied lo be 
in error (excessive error rate, for example), then the link 
is removed from service, messages are rerouted to alternate 
links, and the atljat ent node Is no1:ified to tlo the same. MTP 3 




TCAP = Transact ion CapabJIUies Appticatmn Pari 
SCCI^ ~ Signatinc) Cofineciii^n Cantrol Pan 
tSUP - rSDN User Part 
MTP fi = Message Trarider Pan Lev^t n 

Fig* 3. Thi^ ,S87 priPlOc nl .stuck. 
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Fig. 4, The SS7 network as seen by the. Message Trarigler P^ 
Level 3 (MTP ^) on t.tio local node. 

then starts aii alignment procedure to tiy to restart the link 
in a clean state. 

Traffic management handles traffic on the links within a 
linkset.. It Is in charge of load-sharmg the traffic over all 
the active links and of rerouting traffic from a failed link to 
another. 

Route management is in charge oi' main tain irig infonnation 
on the network topology and the aviiilability or iinavailaliility 
of certain paths to reach destination nodes. The interest ing 
feature of MTP 3 is that it only knows adjacent nodes and 
destination nodes and not otlier intermediate nodes (Fig. 4). 
It maintains a table of av^iilable routes to reach a destination 
node via an adjac^eni node. 

The other functions of MTP 3 are message rooting , discnmi- 
nation, and distribution. On the outbound path, this means 
n tiding the right link to reach the target destination nocie. 
On the inbound pnth, it has to determine for wiiich liigher- 
level protocol the packet is Ln tended (ISUR SCCP, etc.). It 
can reroute tl)e packet via the network if it determines that 
it is not the intended target. 

SCCP. SCCP is built on top of the MTP 3 layer and provides 
additional end-l(J-end sendees such as connectionless or 
connection-oriented senice, extended addressing, and net- 
work management functions. 

SCCP Users are assigned a specific atidress called a subsys- 
tem numbei' which, along with the MTP 3 address (called a 
point code), makes it possible to uniquely address an SCCP 
user in the SS7 network. The extended addressing feattue of 
SCCP allow^s the use of a label or global fUle in place of the 
subsystem mimber and point code to address an application. 
This allows for symbolic addressing and provides a level of 
indirection with respect to the physical structure of the SS7 
network. The translation of die global title to the subsystem 
number and point code is accomplished eitiier in the signal- 
ing transfer points or in the endpoints. Very often, the dialed 
digits are used as the global title. 

hi connectionless mode^ SCCP operates a hit like UDP (User 
Datagram Protocol) operates in the hitemet w^orld: messages 
are sent to a laiget address (either a global title or a subsys- 
tem number and point code) and are transmitted from node 
to node by MTP 3 to the final destination. There is no guar- 
antee of delivers,^ of the message, nor is it guaranteed that 
the messages will anive in the order in which they 



were sent. The connectionless mode is the most widely 
tised, especially because Tt 'Al* uses it 

In connection-oriented mode, SCCP operates a bit like X.25. 
A virtual circuit must first be opened before data transfer 
can take place. Once the circuit is open, tiiere is a guarantee 
that messages are delivered and in the right order. 

SCCP also has built-in network management functions. Each 
node in the network maintEiins the state of its SCCP users 
identified by their subsystem number. Tlie SCCP layer is Ln 
charge of broadcasting the state of its own subsystem num- 
bers to the other nodes, so that at any time, an SCCP user 
knows about the state of the remote subsystems. 

TCAP. The objective of the Transaction Capabilities Applica- 
tion Part is to provide the means of transferring noncircuit- 
related information (unlike ISCP, which handles circuit- 
related information) between different nodes of the SS7 
netw^ork. TCAP is especially used to access ser%ice control 
points in the fixed network or to access home location regis- 
ters, a shoji message semce center (SMSC), or an eqtiipment 
identification center (EIC) in the mobile netwoj'k. 

The TCAP layer is divdded into two sublayers. The first, the 
trafisacfion sublayer, deals with the exchange of TCAP 
messages. A transaction, called a dialogue at the tLser levels 
can be umslrtwttfrfid (composed of one unidirectional TCAP 
message) when no exi>licit initiation or tenninaiion is needed. 
For more interaction, d.stn/ctured dialogm' is used witii a 
beginning, an exchange, and a termination or an abortion. 
This sublayer uses the SCCP cormectionless service. 

The upper sublayer is the compomiii sublaijer and is dedi- 
cated to operations. An openUion is an action (with ptu-ame- 
ters) to be [lerformed by the remote encl. Kacft operation is 
encoded into components, which are pait of a TCAP message 
payload. Components convey either an operation request or 
an o]Deration response. Simultaneous operations are allowed 
mside a tnmsaction and TCAP is able to support multiple 
simultaneous transactions with tiifferent remote TCAPs. The 
ati dressing for each TCAP user is the addressing provided 
by SCCP (point code and subsystem number or global title). 

Fig. 5 shows a TCAP interaction with tlie separation of the 
transaction layer aitd the component layer. 



Cam^oneTit Tfansaction 
t^yer Layer 



Transaction CEimponent 

Layer Layer 




Fig, 5. A TCAP (TraDsaction Capaliilities Application Pan) trans- 

HCiion, iihowiiig the transaciiQii layer and the component laj'er. 
(ITU-T terminology is shown. ANSI has equivalent sen-ices). 



60 Auguist V^n Hewlett -Packard Jrnimai 



)Copr. 1949-1998 Hewlett-Packard Co. 



Signalifig ^twork 




^=.^=^. 



tl«are 



5^'XXXX 



ning Back 



fkinnecled 



1AMI555-XXXX) 



ACM 



IAM(^5-!tXKX} 

ACM * 



ANM 




Ring 



m Hooli 



lAM = InFtial Aif dress Messsg# 
ACM = Address Cttnt pi ete Message 
ANM = Answer Message 

Fig. 6, A typitai ISUF (ISDN User Part) iiiteracUoiL 

ISUP. Tlie ISDN [Tger Parr (TSl t?) is a rirriiit-rf^lated protocol, 
which means thai it defines and tjanspoits the iieressar^^ 
! ■ -^,11' s to set up, tear down, and control voice and data 
1- n\ iJiLs. It uses the? MTP 3 sendees to transport messages 
from switches to switches. 

A typical ISUP intjemcdon is shown in Fig. 6. User A takes 
tho receiver off ihe hook and dials 555-xxxx. The local 
switch {SSS) looks up its rouiirig i^ble and finds out that 
it should route the call to switch 444, which is an access 
ttitfdfmt (not the final destination). !t then sends an initifif 
iiddrpss messogp lo swift iv 444 via the SS7 network anti 
reserves a voice circuit to switch 444. Wlien switch 444 
receives tJie iiiitiaj address message, it reserves the oUier 
end of the voice circuit, finds out that the call should be 
routed to switch 556, and sends another ISUP initiiU address 
nie^frisage %ia SS7 to switch 555. S^^itch 555 accepts the call, 
reserves tlie voice circuit vnl\\ switch 444, juid sends back 
an address complete message to switch 444, which forwards 
it to switch 333. triggering the ring-back of user A (via the 
voice path). Switch 55 r^ also rings the destination phone. 
When liser B takes the receiver off the hook, switch 555 
sends an answer message over the SS7 network to switch 
444, which forwards it to switch 333. The call can now 
proceed. 

The release phase uses the same kind of me-ssage interaction. 
ISUF also allows many other supplementary services. 

The HP OpenCall SS7 Platform 

riie IIP Opc^nrall SS7 i>latform allows users to build com- 
puterhjLsed signaling applications connected to the si^nal- 
m^ lu^t work. Using comijuters to achieve some of l!ie intelli- 
geii( network functions is one of the key l)eoefit^ of the 
intefligent network architectui^e. Compared to mfxlifying 



switch software, it is less expensive, faster and easier to 
program computers in the intelligent network. The HP 
OpenCaU SS7 platform proiides the hardware and the 
middleware necessary to use a computer in a signaling net- 
work. 

The main characteristics of the HP OpenCall SS7 platform 
aie: 

• It provides the protocols to connect lo the SS7 netw^ork. 
TTus mostly consists of specialized hardware (for MTP Levels 
1 and 2) and protocol stacks (MTP SCX'P TCAP, ISUF) for 
various fiavors (.\NSI, ITU, Chinese, hybrid). 

• It provides high availability— most of the target applications 
are mission-crftica] (see article, page 65). 

• It provides the necessar>^ components for the computer to 
be integrated in a central office. Tins means, for example, 
support of a — 48Vdc power supply, antiseisniic capabilities, 
compliance witli estalilished standards, and so on. 

• It provides open application programming interfaces (APIs) 
for users to \mte applications. 

The HP OpenCall SS7 platform is a platform in the sense that 
it does not prnside the appUcation itself but rather allows 
users to build die application. The platform can be instan- 
tiated under several options that will be described later 

Core Protocol Implementation 

The network connection Is made by a dedicated commuiiica- 
fions imit called XheslgnaJhig tntfrface unit (see Fig. 7). 
Each signaling interface unit has a SCSI interface card for 
die host ci>nnection and three slots for signaling link inter- 
face cards. These cards provide tw^o links each, with differ- 
ent options for each supported type of link (V,35, DSOA, and 
Tl/El). These cards come from the HP 37900 SS7 protocol 
analyzers. Tlie signaling interface unit can also be expanded 
by meiuis of a dedicated expansion box, which tirovides four 
additional slots (eight more links) for a single SCSI attacJi- 
ment. Signaling interface units can l>e chained on the SCSI 
bus and the platform can currently supp<Jil up to (i4 Links. 

Each signaling link interface card nms t he MTP 1 and MTP 2 
protocols m\i\ sentls iuid receives messages to and from the 
host via tlie SCSI interface. 

On I be !iost side, messages are read by an SS7 driver built 
on top of the SCSI driver iit the HP-UX* kernel. On top of 
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Fig. 7* The network coiuiecitlon ia nuide by a dedif ated cionimu- 
nif'iiiiorin imit called the sigiialiiig luterfaf^e unit, Eaeh signEilLng 
iiiterfafe luiit, has n SCSI interface card for the liost conrK^ctton 

and three slots for sigiuding ijuk interface cards. 
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Fig. 8. For each layer of higher-level protocols, an operation, 
administralion. and nmmteiiatice (OA&M) jjrogrartiinatic access 
is provided. 

these, a single user space process implements MTP 3, SCCP, 
and TCAP, sending and receiving messages lo and from the 
SS7 driver. 

APIs are provided as a library lo be linked with the user pro- 
cess. The library is in cbarge of managing the interaction 
with the user application, implementing interprocess com- 
munication between the application and the SS7 stack, and 
supporting the flow control. 

For eat?h layer of higher-level protocols , an operation, admin- 
istration, and maintenance (OA&M) programmatic access is 
provided (Pig. 8). This allows an application developer to 
control the state of the protocol stark or to implement man- 
agement apphcations (monitoring, configuration, etc.). 

Each layer is directly accessible via a direct APL Some APIs 
are simple wrappers that get the user parameters and mar- 
shall them to the stack. Other APIs, such as the ISUP and 
TCAP APIs, implement some part of the protocol in the 
library itself, allowing wider distribution of the processing 
load. All of the APIs are asynchronous to allow for liigh 
transaction rates. 

High Availability 

As explained above, one of tlie key aspects of the platform 
is high avaOability. The SS7 network has built-in high avail- 
abihty capabilities and it is important that the end node also 
provide these capabihties. 

Our solution is based on the active/standby rnodet (see the 
article on page 65 for more details). To eliminate any single 
point of failtu'e, every element is replicated (see Fig. f}). Two 
host^ are used, one being the active host and tiie other the 
standby host. Only the active host processes the traffic, while 
the standby just keeps its state up lo date. For network 
attachment, we use dual-ported signaling interface imils, 
and each unit is connected to two different SCSI chains 
terminating at each host. The dual -ported signaling interface 
unit has built-in logic such that one and only one SCSI biLs 
can be active at any point in time. 

Each host has two SCSI interface cards, each connected to 
one half of the signaling interface unit set. Fig. 9 also shows 
how signaling interface miits are clmined. The active SS7 
stack, nmiung on the active host, uses tlie two SCSI chains 
terminating at it. The standby stack controls its two SCSI 
chains but does not have control of the dual-ported signaling 
interface units. The active stack has control of the signaling 
interface units \ia its two SCSI interfaces, hi case of a failure 
of the active side, the standby side wiU take over and will 
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Fig. f*. High availability' of the HP OpenCall SS7 platform is based 

r>ii \he active/standby modeL 

take control of all the signaling mtertace units by using its 
two SCSI huses. The switchover happens in less than six 
seconds to be transparent from the SS7 netvs^ork point of 
\iew. Refer to the article on page 65 for more details on the 
mechanism. 

Tlie SS7 links of a given linkset must be connected to two 
different signaling interface units so that if one signaling 
ijiterface imit happens to fail, the SS7 I raffic will be routed 
transparentiy by the network to the surv4\1t\g signaling inter- 
face units. Therefore, from a network attachment point of 
view, the architecture Is more a load-shailng architecture, 
whereas it is an active/standby architecture at the bos! level 

From an application point of view, the API hides the fact 
that there are in fact two SS? stacks mnning. Application 
developers are free to use their own high availability mecha- 
nism, cither load shared or active/standby. 

Distribution 

Another important aspect of the HP OpenCall SS7 platform 
is its ability to support distributed apphcations. The key con- 
cept here is afront-erid/back-end mode. The front-end com- 
puter supports the SS7 cormection and protocol and the 
back-end computer supports the application. A typical con- 
figuraLion is shown in Hg. 10. 

The SS7 stack is able to distribute the trafBc among several 
instances of the apphcation running on back-fmd nodes, Tlie 
apphcation instances can run on several nodes and several 
instances can run on the same host- The API completely 
liides the distribution and the active and standby instances 
of the Slack. Thus, an application can be configured to run 
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CoRiiponent 
layer 



Signalifig Interface Units 
^^^P Front End 

Fig. 10. A rrtint-end/back-end mode aOows the HP OpeaCaU 
SS7 platform to support distributed applications. 

either on a simplex node (no high availabUity), on a duplex 
node (active/standby), or on a bark-end n{>de wifhcjui modi- 
fying anything in the code. All c**mietiions ber.weeo the vari- 
ous systems are made over a dual LAN Qiossibly FDDI for 
high -end .systems) to eliminate any single point of failure. 

Tills flexibiUty* allows users to use their own hijj^i availability 
and distribution schemes. 




File Descriptor! 



Keni«| 



Staclc Implementation 

As mentioned earlier, MTP 3, SCCP, and TCAP are imple- 
ment ed in a single user space process. The protocol hnple- 
nientation started in 198B. The SS7 stack uses object-ori- 
ented teclinology and a n^essaj^e passing bus for interobject 
comnnmication, f^'ig. J 1 hshows^ the stack implementation. 

Each layer has a software bus instatiliated. Entities can dy- 
namically register on the bus, specifying what kind of mes- 
sages they are interested in. Entities are object classes that 
model elements of the protocol. Typical objects are MTP 3 
links, SfCT remote subsystem nmnbers, and so on. Each 
oljject iiistajtce is associated witli a imique key (object iden- 
tifier), iLsutilly extracted out of the protocol iiifomiation, 
which alltiws very efficient dispatching by the bus. Entities 
can send messages on the bus to be multicast to the target 
entities, ciUling one of tJieir base class methods. 



Fig. il. SS7 stack impiementaUan in tlie PIP OpenCall SS7 
|j]atf(>rm. 

This method has proved to be very efficient in terms of 
encapsulation and coupHng l>etween objects. (Note that 
the MTP 2 layer does not implcniK'iu ihe MTP 2 protocol but 
rather provides the interfere with tlie MTP 2 miplemented in 
the signalmg interface units). 

Message Set Custonmation 

To exteiKl tfie capabilities of iJie SS7 platform, it is neces- 
sary io ijrovitle more built-in protocols as they are adopted. 
Tliese nev^' protocols are built on top of TCAP and are used 
in the intelligent network or hi the mobile network. .Al- 
thougii standardized, the flavors of these protocols vary 
Frfuit network to network and from vendor to vendor. The 
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Fig. 12. i\Ii'WH;ij4t' s(M njsfiniii/r.iTj<>]i li'cJiiiolrii^^v (•(UisLsts (jfa 
irii'.s.Si {,!;(' si-t t^r!iTi[>tls'r ill rrHijiiiidioji vviUi a griuTif nm-liitiv 
(^n^^tiie Ui pn«'i'!*s I til' nircHlrfl ni(',Hsnjj(t'K. 

same applies for ISUF^ lor vvfnch ilwrv iy ahuiil one xrisioii 
per eoiuvrry. 

To denl vviili ihv diMTsity (jfiiu'ssage foroials Ml I he pnxliiet 
level wiih^Hil having ir) do a ispeeial veisioii rnrever>' lU'W 
Oavor of the proioetjU, we lia\-t^ developed a messagf srf 
rif.sUjnuzattarf technology to automate the eustumi/alioii 
of a protocol This eothsisls of a ines^iageset compiU^r \u 
eonjuiHtioa wiih a genetie run-Hnu' engine to jjroeej^s the 
eneoded nu'ssagew (see Mg. 12). 

The luimal c if the messages used tiy the pjotneol is det'ijied 
in Ahstrael Syntax Notation 1 (ASN.l) with some speeifie 
atmolatiuns. ASN. 1 is the ( >SI stamlard for dellniiig data 
SI met ores and is us(»<i hy nu>st uf the protoiols that we 
imt)lemenl, llnwtntM', !he message' sel eniopilin lee|m<ilogy 
is M<il Kvsirieled to ASN J. A protocol siieh as ISl I* whieli 
is [loi delioi (1 in ASN J, ean also he atrommodated. 

The romjnler generates a mptadatafilr, whieh contains the 
definition r>rthe niessa|4es. Th(^ nm-time engine loads these 



nietadata llles and r;ei irnnHnliately encode anrl decfide new 
dt^iniiions of messages withoni imt>aciiug the API or requir- 
it^g recompilation or relinking, Tlie benefit f)f this tecimology 
is that it can adapt tlie |in>dact to the nsers ex;Rt s|iecifica- 
tions at the lali'st possible stage, wiihfiot iinpaciiJig the vQve 
prodtKI . 

Performance 

The IIP tipent'all SS7 piatform can handle mrire Hum 2 UK) 
SS7 transactions pi rsecrjnd, a Iransactioji hehig defined im 
one message into a tin in my applicariofi and one message out 
Irnm the apijlication. These figures were measurtHl on an IIP 
HOonMotlel K12Uhost compnter. The imp(enierUat:ion is 
CPl^-bound, so its cajiacily antoniatically increases when- 
ever niore ].iow eiful system hard\\ aie hecoines availal>le. 

The constraint*! set up for ihe development were rather 
Hliingeni. and very similar to tn-keniel development. For 
exaint>le, no file systent access is allowed excejit at stailup. 
and all i alls nuist he ttsynchronous. We are t losely watching 
the L'O traffic to avoid falling inltj I/O Ijortlenecks. Ono of 
the reasons why we dtj not get I/O ht>ttlenecks is tlvat we 
gronp 4is many SS7 messages as possible together before 
doing any transfer, F(H'S( SI. Ihis is inaiulalory because 
St Si is archil eel ed for ratlu^r large data tianshu's whereas 
SS7 handles veiy small messages { — lt)tj bytes}. Therefore, 
1 1 a^ signaling interfat e unit and the SCSI driver jjuiposely 
int ro<loce son re latency to trmisfer larger data blocks. 

flF^-UX R • mi IQ.D tor HP eODO StifiBSi /DO and BtJO cj^mfwteisflre X/Operi Companv UNtX B3 
biij ruled prutlucts. 

UNIX ts a rfiyr.Rifired tiaderrrafk m rhe Utiiied Stales af«d other putitnes. kensed BxtJui^velv 
ihfough xyOpeii Cofflpanv Limited 

X/Qf>en IS 3 rstvsTGfed trarieni^ft and Ihe X device \^ a tradematk o! X/Dptn Campany Liniiied 
in the UK and other cduntrifls 
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High Availability in the HP OpenCall 
SS7 Platform 



Fault tolerance in computer systems is discussed and high availability 
is defined. The theory and operation of the active/standbv HP OpenCall 
solution are presented- Switchover decision-making power is vested in 
a fault tQlerance control ter process on each machine. 

by Brian C. VVyJd and Jean-Pierre Allegre 



Our livf^ iuv iiKTt^iLsingly (leper idem on (lurtt^chntjlo^V. Some 
tilings aie lri\ial, like being abW to wnwU om favorile TV 
pi'ogram. wliile somt*iu*e nine li mtire hnpniiiiiit, likt- medical 
pquipnienL \\ hen yon si an ro look at ihe ciilTereiRe lecliDol- 
og}^ makes to our lives, starting from the coiivenienee, yon 
bt'gin tn njipjeiiate Ihe prnhlems we wonld iuivi' ]{ i! were to 
break (if>wiL 

Some hn^akclowns are merely hnUible. Not being able to 
phom* 1(1 arrange a niglir ont isn't i^oin]^ ni kill anyone. Bm 
when yon ean'l phone for' heljy i[i aa t'mergeiiey. tiien a laek 
of ihe .sememe we all take for graiuefl is a k>t more serious. 
Being able Tn ensure^ thai the teelniolngj' wp use eveiy tlay 
keeiKS Wfjiknig \s iherebne a (jait tjf the funeiitmahty, as 
iiuich as ))ri>vitlnig Ihe seniee in the fii-sl plaee, 

Ahhnngh il Wfjiild be iiiee if things never broke flown, we all 
knt)vs ihal Ibis is imtn^ssible. Everyrliing htus a Haw — entrofiy 
always gv\H us in I lie end. The imsirikable ship sinks, the* 
nninterniptible powi»r sni^ply gets eut. the unbreitkalile f>lale 
)injves Ui hi' only loo breakable (usually with tlu' iissislanee 
of a f^liihi). 

We all (le|;end in one way iH' aruJlhi'r nn the eontiiine<l baie 
lir>itnig Ml" niir te<lmnli)g;y'. ^^^ ^ve all have some dependeni e 
(m the toieraiiee of that leehnolog^y to the IVniirs thai will uv 
evibibly stiike it. When a< umpnitM' ntall\ineti<Hi e^ni (lisrnpt 
rhe lives of millions uf jieople, IVuih nilfraiue is nul jusi 
1 1 i vi\ bn I a i is u h 1 1 e 1 y 1 1 vti * ssi i r v 

Computer Fault Tolerance 

Cfjrtijmter faiill loleranr c* covers a range <?f binelionalily. 
The impnnam ftstH-eis \u eonsider are Ihe speed of rei*oveO' 
in Ihe presenee of a iaitlb tin* [>erreptk>n of the n.^^ers in ease 
of a fault, and the etnisetjuenees \o th(> a|ipheatinii of a faull. 
Wirh ihese in mind, ilie rnlluwing degrees i>f fat^ll bilennice 
are ohen defint^ff 

Reliable Service. Tln^ system is bnih in he as reliable as (hjs- 
sibti. Nij cirnrl lupnnUle lolennu e of laulls is madt\ bnl 
fni^J7 t^ail iif Ilie sysK^m is engineered to be as reliabJt^ as 
possible to avoid the jjossil>ility of a fanlL This inrhides bolh 
Hk' liardwaie, with either overengineeretl ( unipunents or 
rig( irons tesling, and tlie software, with design methods thai 
al tempt to ensure bLig-liee cutk^ and user hilerfaees desigiRMl 
i<» pnn^eiit operatormistakes. Ri4iabihly rates of IW.bMciiii 
l"ea< Jtirvi^d. 



High Availability. Tlie emphasis is cvn makhig the senire 
available lo die nser ^ls maeh of the lime as possible. In thi* 
evem (jf a fault, the user may nutieesome inennststeney or 
intermt»tion of senice, bm will cilways be able u> reef>niiert 
to the sei^lee ajid use tl agidn either immedialely or within a 
shmply bounded peiiotl of lime. A reliability rate of *JJtRSJIHi> 
is The Target. 

Continuous Availability. TliLs is ihe jdanaek^ of fanll tolerance. 
When a faull oeeurs. the user notiees nu inhMTuption or hi- 
eoiisisleney in die si^mce, Tlie seiviee Is always i beret never 
goes away, and never exhibits any behavior that leads the 
user to Ihink I hat a fault might have hajrpened. Needless U\ 
say, this le\el is bolh tlin'ieult and i^\t)ensive tfi obtain. The 
retiabitiiy laie is 100%. 

hi geruMaf a faull tolerant system will be liighly available in 
lufisl pmts. with touches of continue jus avaitabilily. 

Aehieviiig Fault Toleraiue 

Fault tolerance is usually aehii \ I'd by using redundaiU ccjm- 
ponenfs. BiLsi^d on the assmn|iiion that any conifKment, no 
mat tin- luiw rehable, will event tudly eilher fail en re<|nire 
planned mainlenance, ever> eomi)r»neut in Ihe system I hat 
is vital is duplieaicd. This reduuflaric y is designed Sfi Ihal the 
comp{)nenl can \m: removed with a miuiiual amouni otdis- 
rujition to the operation of the sen. ice. For instance, using 
mirrored disk <lrives anr>ws a disk in tail or be disconneclerl 
wiibout altering llteavailaJiililv of ihe dala stored on I he disk. 

The way Hk^ riuiuntianey is draigned varies according ifj the 
t>aiadigin usimI in the sysU^nr Th<u'e are sev(Tal ways to 
Inhid hi the nHlundaney. dei)en<ling on the use matte r>f the 
redundant eornponenlsaml how consistf^ncy is nmhiliiiiie<l 
In^tween them. 

Multiple Active. The redundant eomponents may in faet be 
us(Hi simiily to provide Ilie st^ivire iit a load-sharing w^ay. 
In Ibis ease, dahi ami hnulionaliiy are [provided ideniieally 
by all du^ eomiionents. The load from the users of the ser- 
vice is si>read aei'oss tlie romi>onent.s so that eaeh hainlles 
a fiarl of the kjatf bi the event of a eomponen! failure, the 
Inatl is taken ujj by tlie others, and I heir load ine leases 
eolTi^SJJomlingly, 
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Active/Stairdby. In this paradigra, there exist one or more 
active eoiin>(>neH(s, which provide the setvice In the users* 
aiitl ill parallel, one or more standby components. These 
provide a shadow to the actives and in the event of an active 
failing, <himge state to tjecome arrive ajid take over ttie jot?. 
Vaiiatif^Jis on I lie (lierne inv<ilve 4Mie-I0-oiie aftive/sta]«lby 
pairs, N activey to ojicsliuidtjy, and N actives to M standl>ys. 

Coupling 

How the seivire is aiTected by the failure is also ijaiportant. 
The redundant component, whether it was also providing 
the ser%ice or not, can be loosely or tightly coupled to the 
other components. 

When loosely coupletl, the redundant. coniix>nent only has a 
view of the state r>l live active at certain tltnt^s — at tiie cud ol 
a transaction, for instance. This iias tiie eff*xi Itial a faiiure 
of the coniponern white processing a users request wiii lose 
the context. Any nnisiied work wilt Ije unaffected, but work 
in progress is lost and tiic user umst n^start. However, the 
effort retiuired tu keep tiie standby coupled is low* 

A tightly coupled coinponeni wit I reinaiji in slep with tiie 
component prcjcessing the retiuesi. lyo that i! can lake over 
seamlessly in the event of tiie fanlf. The workload is much 
hugher because majiy more messages nuist lie exchanged, 
and die speed of I he operation may be skmer to ertsure that 
at every stage the stajidi^y is in step with die active. 

Of Cfjtu>ie, numy shades and granularities (if loose versus 
tight coupling are poysible in a .single system. 

Traditionally, hardware fault tolerant systems have been ex- 
ponents of the tight coupling paradigm: two or more proces- 
sors execiUing exa<:*tly the same instiiictions in synch ioniza- 
tion, witii tlie outputs selected on either aji active/stimdl>y 
basis or i>y a voting system. Stjftware systt^nis iiave leajied 
more towards the loose coupling method, al various levels 
of gnuHilarity. For instance, there are database transact icmal 
paradigms in which usei' datahase accesses are bundled into 
transactions, and only once a trar^sacfion is coon nit ted does 
that transaction become certain and unaffected in the event 
(jf a faihu'e. 

Predicting and Measuring Fault Tolerance 

Vaiious statistical methods exist to measure the fault toler- 
ance of a system ui a quantitative manner. These usually use 
the standard measures of system failiue su€"h as MTBF 
(mean time between failures) and MTTR (mean time to 
repair), and are combined to give a forecast of application 
downtime. Elowever, the values (if do\\iitime produced l>y 
such methods can be inaccurate, and sometimes bear tittle 
resemblajice to the true values. 

The main reason for this is that failures may not he isolated 
and un correlated, and this is veiy difilcidt to take into 
account. Simply predicting from the .MTBF and MTTR that 
the chance of a single failure bringing down die entire sys- 
tem is very small is not realistic when the single failure will 
often provoke subsequent related faiiures, often in tiie part 
of the system trying to recover from I lie fault. Most fault 
tolerant systems anil relatetl stalisUcai imalysls are Ijasetl 
on ai\ assumption of a single failure, and systems aie built 
to avoid a single poitit of faikire. In practice, and in the often 



inconvenient real world, faihires can happen together, and 
can cause other failures in uirih 

Tlie other assumption that causes trouble is tliai of silent 
failure. It is oftt^n assumed that when a component fails, it 
does so silently, that is, in a failure mode tiiat doesn't iiffect 
ot tier components. F^or instaru'c, having a dual I^N i>etween 
several conij inters to avoid tiie LiV.N's being a single j joint of 
failure doesn't iielp when a crashed computer decides to 
send out nonsense on all of its LAN interfaces, effectively 
|}revenling use of any LAN. 

Downtime Causes 

Tilings that cause downtime on systems cmi l)e grou] led uito 
several main categories. First is the oimous conijniier hard- 
ware IViiiure. Tiiis may be caused by a £M>m[)oneut s lifetime 
iicijig exceedetl, by a faidty component, or by an out-of- 
specification component. Often, hardware failures in one 
coni]jonent can cause other components t o fail. Mauy com- 
puti^i' systems are not constructed to alitjw a single Cf>nii>o- 
nent \o fail or to be replacc^d without affecting uthiHcompu- 
nents. For instance, a failed disk drive on a SCSI l>us will 
force the entire system to be halted for its replacement even 
though only one component has failed. 

This often implies that avoiding the single point of failure 
means adding more haidwaiv titan n light seem reasonable 
^a second SCSI i ontroller c ard and chain for instance, so 
that tiie 1 lack up disk drive can be on a sepaiate SCSI bus. 
Reliable liardware, coupled witii a system buHt to allow hot- 
swappable components can do a lot to eliminate this source 
of dov^iUime. 

The se<"ond obvious cause of downtime is scjftware failures. 
No styftware will ever be entirely bug- free. Even formal 
methods, quality reviews, and all the rest of the trajipings of 
computer science camiot keep those eliLsive problems from 
shpping in. The main prol:>lem with bugs is that tlie c}nes that 
escape to rt^leaseil systems aie usually in code that luis not 
been weil-Icsted. This may tiflen l>e the <*ode designcti to 
recover froEu failures, siru'e this is difficult to cover fully 
with testing, fiecovery may alsc» often involve a higher load 
tliau normal as slandiiy [>r(jcesses become acrive, load files, 
iind .so on, ;ind I his can often expose lurking bugs. The net 
effect is tiiat when yoiu api>licatit>n craslies and lyimis, ycmr 
standby a)3[>lication. ready ami waiting to continue the ser- 
vice, promt jtly follows it tlown in tlanies instc^ad. 

Another not so obvious but very real source of downtime 
is operator inten'eruion. Although in liieoi% operators of a 
system will always follow procedures liud always \vwl tite 
matinalt in |h act ice tiiey iUT prone to tyiiing rni * (deleting all 
of the files on die disk), and pulling out the wrong jiower 
tjlug. Even when the mistake is not so ol)vious. nusiakes 
such as badly conllgured systems, too much load on a criti- 
cal system, or enabling unneedcd tracing or statistics can 
bring the system down* 

No amomu of cle\er tault t<»leiant algorithnts or mathemati- 
cally proven designs will help here. However, a carefully 
planned system configuration, with working defaults and 
a user interface that is designed io help the user make the 
correct choic^es i>y preseiitingthe c**rrect information m 
a timely and obvious fashion, cart go a long way to%vards 
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avoiding these sorts of problems. Tliis is, unfommately, an 
often neglected part of the system design, isith anendant 
problems. When buijding a system that end users wUi use as 
a platform, with programming interfaces, the importance of 
providing usable interfaces becomes even greater. 

Finally, there are the disasters. The absolutely unforeseeable. 
Qne-m-a-milhon, couldn't happen in a lifetime chances. Who 
could predict that the earthn\o\lng equipment would knock 
down the pylon supplying the mains electricity to the com- 
puter centerj which would fall onto the telecommtmications 
lines, blowing every piece of data communications equip- 
ment^ after which it would careen into the hole whe-te the 
water main was being repaired, breaking it and flooding the 
basement where the batteries for the UPS are installed, 
completely destroying them? "Impossible/ you might say, 
but it has happened. In such cases, geographically separated 
sites can prove to be the only possible soiudon if a 100% 
available system Ls really required. This does rule out certain 
forms of fault tolerance^ — any form of dual-ported hardware, 
for instance, or lockstep processors — but is possible with 
software fault tolerance techniques. 

Telecommunications Fault Tolerance 

The requirements on a fault tolerant system vary^ with 
the application- In teiecommunicationSj we see different 
reqiiirements being demanded depending on the element 
being addressed. On the billing services sidet the require- 
ments are biased towards ensuring that no data loss occurs. 
Ldmited application downtime is acceptable but any billing 
data should be safe. This sort of system is similar to the 
requirements of any database-oriented applicarion, and tech- 
nologies such as mirrored disks and reliable systems are 
usually sufficient. 

For operations services^ which provide the management of 
the network, certain essential adntinistration and manage- 
ment functions shotild always be available so that control 
over the network is always ntaintained. In the service provi- 
sion environment for which tbe HP OpenCall SS7 platform is 
designed, the essential requiretnetus are to avoid disruprion 
to the network, to have a continuously available service, and 
to avoid disruption to calls in progress in the event of a 
fatdt. 

To avoid disniption to the network, the SS7 protocol provi- 
sion has to avoid ever being seen as down by the network. 
This essentially means tl'iat in the event of a fault, normal 
protocol processing must resume within six seconds. Any 
longer than this and the SS7 net work will reconllgmie to 
avoid the failed node. This reconfiguration process can be 
very load-intensive and can cause overloads in the network. 
This effect is to be avoided at all costs. 

To provide continuous availability requires that the applica- 
tion that takes over service processing from a failed applica- 
tion must be at all times in the same state witli respect to its 
processing. Tliis is also required to ensure that current calls 
m progress are not disrupted. Tlu* state and data associated 
with each call must be replicated so that the user sees no 
interruption or anomaly in tfie service. 



HP OpenCall Solution 

To fulfill all of these requirements for a teiecomnnimcations 
services platform is not an easy t^k We chose to implement 
a simple activ e/standby high av^ailabilitj^ fault tolerance model 
that is capable of providing most customer needs. 

To achieve high availability, we need ro repUcate all the hard- 
ware components (see Fig, 1), We have defined a platform 
as being a set of two computers (usually HP 9000 Series 800) 
mtercoTmected by a dual LAN. equipped with independent 
mirrored disks and sharing a set of SS7 signaling interface 
units %ia redundant SCSI chains (see the article on page 58 
for more details). 

The highest constraint on the system is to be able to perform 
a sw itchov er in less than six seconds. The SS7 protocol MTP 
Level 2 running on the signaling interface unit can tolerate a 
&-S interval without traffic. If this limit is exceeded, the SS7 
network detects this node as down, triggering a lot of alarms, 
and we've missed our high availability goal- 

In a nutshell, the high availabiMty mechanism works as 
follows. One system is the active system, handling the SS7 
traffic and controlling all the signaling interface imits. In 
case of a failure on tlie active side, the standby system gets 
control of the signaling interface units and becomes the new 
active. During the transition, the signaling interface units 
start buffering the data. When the buffers are full (which 
happens rapidly), the signaling interface units stall sending 
MTP Level 2 messages to Hie other end to signal a transient 
outage. If this outage lasts more than 6 s, the SS7 netw^ork 
detects dtis node as down^ so it is critical that in less than 6 s, 
a new active system take over. The failtire detection time is 
the most crucial one. We need to detect failtues in less than 
four seconds to be able to perfonn a safe switchoven 




Signaling InterlacB Units 

Fig. 1. Th(^ hi^h javjiikiljiliiy solution in Ihe KP OpenCaU SS7 plat- 
kmn tialU for re|jiji :it.ti>ji tsf all hardware comr>onentK. 
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Fig, 2. A process starling up ^ith air active process rumiing in m 
tla^ down state. Mien il reactn-.s the hot tstaiidby siaLejit is ready 
to become tlie active if the active goeu down. 

The architecture goaJ is to protect the platfonn from failing 
in case of a single failure. In general, the dual-fjiiJure ci^e 
leads to a service losSj even though some cases may be 
recovered. 

Software Model 

The HP OpenCall SS7 platfomi is based on an active/stand- 
by modelj vyith a pair of UNIX"^' processes pro^dding a ser- 
vice. Only one of the pair is actually doing the job at any 
time (the active ) vt'hile lt.s peer process (the standby) is idle, 
waiting to take over in the event of a failure. 

The service provided by the process may be the SS7 protocol 
stack, centralized event management, or a tcleconimtmica' 
tions application. Tiie sttuidby process is not completely 
idle. It must be kept up lo date vi^ith the state of the active 
process to be able to resume processing from the same 
point if a failtu'e occurs. Wlien in tliis state, it is hot standby 
(see Fig. 2). 

Consider a process starting up wheti an active process is 
already numing. A process is initially down, that is, not rmi- 
nuig. \\Tien it is staited. it performs vt^hatever startup process 
is required {booting}, and then is void standby. In this state, 
it is correctly configured and could perform the service if 
required, but aU current cMents would see their states being 
lost. 

This would be enough to give a highly available service, but 
would not ftilfill the requirement to avoid flisruption to cur- 
rent clients. Tire process must therefore now SA^Tichi"onize 
itself with the active process, diuing which time it is siftichro- 
nizing. Once it is completely up to date, it is hot standby. 
In this state, current clientiji should see no disru|>iiori if tlie 
active process fails. 

Obviously, if no active is running, the process goes to active 
from cold standby, since there are no current clients. 

Once there exists this pair of processes, one active provid- 
ing the service aixd one stanciby providing the backup, the 
system is ready to deal vvitii a failure. When this occurs, the 
failure must first be detected, and then a decision on tlie 
action to be taken must be made. 



Fault Tolerance Controller 

The HP OpenCall SS7 platfoim centralizes the decision- 
makmg process into a single controller process per 
machine, which is responsible for kirowing the stMes of all 
processes controlled by it on its machine. It has a peer con- 
troller (usually on the peer machine) which controls all the 
peer processes. These t'wofmdt iolerance contrvllers make 
aU decisions with regaitl to which process of the pair Ls 
active. Each high availability process has a connection to 
both the fault tolerance controller and to its peer. 

The fault tolerance controllers also have a connection be- 
tw^een tliem (see Fig. 3). Tlie A channel allows the two fault 
tolerance controDers to cxcliange state information on the 
two system processes iind to i>uild the global state of the 
platfoiTii. The A channel also conveys he^utbeat messages. 
The B channels alloW' the fault tolerance controllers to pass 
commands governing the state of the processes, and to 
receive their state in return. A process cannot change state 
to become active without receiving a commemd from the 
fault tolerance controller. Because the fault tolerance con- 
troller has urFomiation on at! high availability processes and 
on the state of the LAN, CPU, ^md all peer processes, it can 
make a much better tiecision thmi any individual process. 
Finally, the C channels are replication chinmels. They allow 
peer processes to replicate their state using an application 
dependent protocol. 

Failure Detection 

For the success of any fault tolertmt system, failtires of any 
component must be detected quit kly and reliably. This is 
one of the most chfficult areas of the system. Tlie HP Open- 
Call SS7 piatfonn uses several mechanisms to detect various 
kinds of faults. 

To detect a failure of one of the high availability processes, 
a heartbeat mechmiisni is used between tjie fauh tolerance 
cotU roller and the high availability process \ia the B channel. 
UNIX signals are also used to detect a failure of a child pro- 
cess (the fault tolerance controller is the parent of all Irigh 
availability processes), but they provide information only 
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^V - Heartbeat Messages 

Fig, 3* The fault tolerance centrollers make all decisions \vith 
regartl tn wblch profress is active. Each higli availability process 
lias a connection to both tlip fault tolerance conLi'oIler anil to Its 
peer. A heartbeat mechanism helps detect failures. 
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when a prcicess exits. To detect more subtle process laiiits 

such as deadlocks or infinite loops, the heartbeat mechanism 
is required, but it has the drawback of mandating that every 
high availability process be able to respond to heartbeat 
messages m a timely fashion, usually around e\'ef>^ 500 nis. 
This is not so critical in our envirorm^ent since we expect 
our processes to behave in a quasi-real-time maimer, but it 
rules out using any potentially blocking calls, 

Tb detect system hang, we use a specific mechanism imple- 
mented in the fault tolerance controllers. Each fault toler- 
ance controller, mmixng in HP-UX'*' real-time priority mode 
(rtprio), uses a real-time timer that ticks every 2 s (typically). 
At evepj' tick* the fault tolerance controller checks the dif- 
ference between the last tick and the current time. If diis 
difference exceeds a certain tolerance, this means that the 
system ha.s been himg for a while, since the fault tolerance 
controller is configured to have the highest priority in the 
system and should therefore never be prevented from re- 
ceiving a reaj-tinie timer I'pon occuirence of such an e%^ent, 
the fault tolerance controller exits after killing the other high 
avail abOity child processes. As strange as it may soimd, diis 
is the safest thing to do. If the system has been hung for a 
while, the peer fault tolerance controOer should have also 
detected a loss of heartbeat and should have decided to go 
active^ If we were to let the faitit tolerance controller of the 
hung system keep running once it wakes up, we would have 
two active systems, mth aU the possible nasty effects this 
entails. 

The two fault tolerance controllers also exchange heartbeat 
messages (along with more detailed state information). 
Should a heartbeat fail, the fault tolerance controller of the 
active side will assume that the peer is doviqi (for example, 
becatise of a tkial-LAN faihu'e, a peer system pmiic, or a peer 
fault tolerance cunt roller failure) and will do nothing 
(except log an event to warn the operator). If the lault toler- 
ance controller of the standby side detects this event, the 
fault tolerante controller will assume that something is 
wTong on t he active side. It will decide to go active ancJ will 
send an activate conunand to all of its Mgh availabihty pro- 
cesses. If the old active is indeed dead, this is a wise deci- 
sion and preserves the ser\ice. In the t-tise of a dual-LAX 
failure ( this is a dual-failure case that we are not supposed 
to guard agau^t), we may have a split-brain syndrome. 
In our case, we use the signaling interface unit (see article, 
page 58) as a tiebreaker. If the active SS7 stack loses control 
of the signaling interface unit (iDecause the peer stack has 
taken control of it), il will assumt^ that the otlier system is 
alive and will exit, asking the fatilt toleraitce controller not 
lo respawn it C)peratt>r lnier\ention is necessary to clear 
the fault and bring tlie platiomi back into its duplex state. 

Dual-LAN Support 

At the time the HP Open Call SS7 platform project started, no 
standmt! mechanism existed to handle dtial l^Ns. nor did we 
want to implement a kemi^l-level dual- LAN mechanism. We 
titer ei ore selected a user space mechanism provided by a 
library that hides the dual LAN imd pro\1des reliable mes- 
sage-basefi communication over two TCP coimections 
(Fig. 4), 

A message library provides the dual-LAN capabibty and 
measage boimdaii^ preservation. The message library opeiLs 
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Fig. 4, The dual-LAN mechanism Is a user space mechanism 
provided by a library' that hides the dual LAN and provides reli- 
able message- based comroimfcation over two TCP connections. 

two TCP connections/ one on each LAN. Only one LAN is 
used at a time (there is no attempt to perfomt load shajing). 
On each of the TCP connections, we n^aintaio a small traffic 
of keep-alive messages (one every 500 nis)^ which contain 
jast a sequence number. On each TCP connection, the mes- 
sage Ubraty monitors the difference between the setytience 
nombers on each LAN. If the difference exceeds a given 
threshold, one LAN is asstmied to be either broken or over- 
loadedf in which case the message hbrary decides to switch 
and resume traffic on the other LAN. No he^trtbeat timer is 
used. Only thfferences in roimd-trip time can trigger a LAN 
switch. The benefit of this solution is that It is indepemleni 
of the speed of the remote process or reniot e much hie and 
scales without tuning from low-speed to high-speed LANs, 
It also allows veiy fast LAN switching tinip. 

A drawback is the sensitivity of this mechanism to a loaded 
LAN, which is perceived as a broken LAN- For this, we rec- 
omnietKl that tm extra LAN he added to the system dedicated 
to application bulk traffic. Ant)t her problem is ti^at when 
swiieliit^g LANs, we have no way of jTtrieving unacknowl- 
edged TCP messages to retransmit on the n(*w IAN, ^o we 
eiul up losing messages upon a r4AN switch. Some paits of 
the platfonri guard themselves against this by implementmg 
a lightweight retransmission protocol 

Access to High Availability Services 

An important objective of the HP OpenCaQ SS7 platfomi is to 
sltieid the application writer from the imderlyu^g high avail- 
ability mechanisms. We came up with the scheme illustrated 
in Fig, 5 to access the high availability processes. 

Let's take the example of the SS7 proces.s. The SS7 fthranj 
mahttains two message Ithraiy comiections (four TCT* con- 
nections because of the dual LAN): one with the active in- 
stance and one with die standby. The ({{( Hhrarg (user fault 
tolerance library) transparently mmiages the two connections 
and always routes the traffic to the active instance of the 
stack. [Jpon switchover, the SS7 process informs iti^ client 
hbrai^^ via a surviving message lihrary connection that it Is 
now the new active ajid tiiat traffic should be routed \o it. 
From an API point of view, the (wo com unctions (foiu 
sockets) are hidden from the user by exporting an fd_sst as 
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Fig. 5. Tlip iiieLluKl nf iU!f!essing 
the high ijvailiibility processes is 
riesigned to sliieJcl Ihe application 
uTiler from the uhtlerlyijig liigh 
avalJabilli V ii leej lat mms . 



used by setectj] instead of a file descriptor. Tlie application 
main loop shoiikl Ije built along the following lines: 

while ti) ( 

APX " pre - se lect ( firnii fiwrn, fiem, & t imeout ) ; 

// application possibly adds 

// its own fd in the mask here 

/ / FD_ SET { rm, myFd ) ; 

select (&nn, &wm, &eni, &t imeout ) ; 

AP I -poBt" select (&nn, ftvnn, £:em, &t imeout } ; 

// application possibly checks 

// its own fd here 

// if (FD_ISSETCrm,mYFd) ) {} 
} 

Tfie api)lication niaiti loop must contimiously call the pre- 
select fimctioii of 1 lie API to get the accmvite vidiic of fd_sel 
(sockets call be closed and reopened transp^y ently Lii ciise 
of failure), then call selectj), possibly after ha\iiig set some 
application-specific file descriptor in the loi^isk* tlien call the 
API postselect fimction with the niiisk ret timed by selectO. hi 
the postselect phase, the library lumdlejs all necessai'y proto- 
col procedures In luaintuiit an accurate view of the state of 
the high availabiliiy process, along with user data Uansfer. 

State Management 

One of tlie key aspects of the acrive/standby paracUgm is the 
definition of the process state and how^ precisely it must be 
replicated. Tlte framework described above dot's not enforce 
bow state tnanagenvenr should he peiformed, It provides a 
replication chaimel between the tw^o |)eer [iroeesses and 
infoniiation about the processes, but no s|>ecific semantics 
for state. Different schemes aie used depending on the na- 
ture of the state to be rephcated. A key element to consider 
when designing sucb a system is the state infonitatirm that 
must be presen ed upon switchover and its update frequency. 
For mstance, blindly replicating all state information in a 
system targett*d at 4000 messages per second w ould be liiglily 
uiefficient, because the replication load w^ould exceed the 
actual processing. 



For these rciisons. we have not set np a generic state reph- 
cation mechanism, tnU ratlier buiki ad hoc mecharvisnis de- 
pending on tUe naltire of I be state. For exantple, on the SS7 
stack, Ihe MTP 3 pnitocol has no state asscjciated with data 
Irai^sfer (such as witjdow value, pendii^g timer, connection 
stale), but has a lot of network management infonnation 
rbal must not be kist in case of switchover. 

The poUcy has been to intercept MTP S ruanagertient mes- 
sages coming fiom the network fir fn>iii the OAiStM (otjera- 
tion, administiation, and mainteitam e) API tOid send ibem 
to the slinidliy via the rei^h cation chanjiel or the API. The 
standt>y stack processes the MTP 3 mmiagement messages 
in the same way as the active and tlie ccmiputed states are 
identical. 

TCAP transactions are not replicated because of the high 
rates of creation iUKi tieletjon mid the amount of state infor- 
ii\atiou associated with the component handling. The effect 
is diat opened TCAP tiansactions ai'e lost in case of switch- 
over. Work is progressing on a scheme that preser\-es trans- 
actions by letting the user of the tnmsaction decide when the 
transaction becomes important and should bt^ replicated. 

An alteniative to repljcaUiig messages is to replicate the 
slate after it hay lieen computed from tlie message. The usual 
algoritlim for this scheme is to do the conit)utation on the 
active si tie, use aii ad hot* protocol tr* nuu'shall the new state 
to the standby, ami let tiie standby update itstif. If the stand- 
by fails to replicate tlie state, it decides to go to the down 
state and will be restarted. 

Another important design aspect IVnthe high availability 
system is tJie s>Tichronizatioii phase. A starling coki standby 
system Ii^ls to perfonu the cold standby to bol sraritlby tian- 
sition b>' getthig all the state udbrmation from the active aud 
rebuildhig it locally This operation should disturli the active 
as little as possible, but ctu-e must be taken that the algoritlim 
converges. If the state of the at tivt^ changes faster than flie 
standby can aI>sorbj there is a risk tliat Ihe staiirlby may 
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never ('atfh up. This Is usually address€»d by assuniinii tfial 
the staiictby hiis much more CPl' available than the at'tive* 
and if neeessiiJ^; slewing duHn the active. In the case of SS7 
signaling, the jiniount of stale inforniation Is rather small 
and the iiifonnation is siaJiIe, so we use a relati\'ely simple 
algorithm. The sviiehronization is t>erfomied by ha\ing the 
stimdby fork a heljier prcK^ess lliat, \1a the SS7 OA&M APL 
dumps the rontent of the siHte infonnation to <iisk and then 
replays it lu Llie standl>y \1a tJie APL To check tlial the state 
is correct and tliat the st-anclby can go to hoi standt>y, the 
standby stack initiates an audit phase that checks that the 
two configurations arc identical. If this is not the case, die 
process is resimicd. Otherwise, the stale of the standby goes 
to hot siandby. Tlds is a simple implementation, but has 
proven to be sufficient for SS7. 

Techmcal Challenges 

Developing a higli a\ ail ability platform on tl^e HP-ITX operat- 
ing system lias been a great challenge » but we've obtained a 
ver>f statile and ofjeralionaJ product . deployed in himrirr^ds 
of sites work! wide. 

One of the technical challenges was that HP-UX is not a real- 
time operating system atnl we need determinism lo f^ajuUe 
the high avail at >ihty tispects, es|)ecially with the very small 
reaction rime (<ti s) that we are allowed. We\'e adf!ressed 
this by forbidfiing some operations (like tile system access 



;ytd pt^tentially him* king s\ stem calls ) in time-critical pro- 
cesses such as the SS7 stack, by slicing e\^ery long-lived 
operation into small o|)erations, and by trying to stiiy l)elow 
the satiiration limit (where response tinie starts to increase 
rapidly). For example, we recommend keeping o\ erall 
CPV utilization below 85% and staying far below tiie LAN 
maximum banciwidth. 

.•\jiolher ^^ballenge was time synchronization between the 
viirious hosts. We do not need time syiichronization between 

the hosts for proper operation, but some of onr customers 
refiuest it. We'\'e used the XTP pack^e (Nerwork Time 
Protocol), \¥hich has proved to work retisonably w^ell except 
when the clock drift betw^een the hosts was too large to be 
comperisated smoothly and NTP decided to suddenly jnrni) 
the system clock to catch up. Tins caused problems R*r 
synchronization of events, and also fired our failure detec- 
tion mechanisms. We resolved these problems using exter- 
nal clocks and configuring NTP in a controlled manner to 
a\ oid such time* jumps. 

HP-UX 9 " and 1 n.Q for HP 9000 Series 700 and BOD con\pmi3fS are X^Open Company UNIX 93 
branded prudurts 

UNIX js a registered trademark in Ihe United Stales arid ortier countries, licensed Bxciusjv&ly 
thraugri X/Open Company Limned 

X/Open IS a regfsterEd trademark and Ihe X device is a tfatfema/V of X/Open Company Lrmiied 
in the UK and oiher cautnries 
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A Benchtop Inductively Coupled 
Plasma Mass Spectrometer 

The HP 4500 is the first benchtop ICP-MS. It has a new type of optics 
system that results in a very low random background and high sensitivity, 
making analysis down to the subnanogram-per-liter (parts-per-trillion) 
level feasible. It can be equipped with HPs ShieldTorch system, which 
reduces interference from polyatomic ions. 

toy Yoko Kishi 



Inductively coupled plasma mass spectrometry (ICP-MSJ is 
an analytical technique that performs elemental analysis 
with excellent sensi I hity ajul liigh sajTiple throughput. The 
ICF-MS instrument entplo>^ a plasnia (ICP) as the ionization 
source and a mass spectromei.er (MS) analyzer to detect the 
ions produced. It can simult-aneously nieasure most elements 
in the periodic table and deteiTuine analjte concentration 
down to the subnanogranvper-liter or part-per-tiUhon fppt) 
level. It can perfonn qualitati^'e, semiquantitative, and quan- 
titative analysis and compute isotopic ratios. 

The schematic diagram of an ICP-MS mstrument is shown in 
Fig. 1. Basically, hquid samples are introduced by a peristaltic 
pimip to the nebulizer where a sample aerosol is fomied. A 
double-pass spray chamber ensures that a consistent aerosol 
is inti'oduced to the plasma. Argon (Ar) gas is inlrockiced 
tlirough a series of concentric quartz tubes, known m the 
ICF torch. Tlie torch is located in the center of an RV coil, 
through which 27.12-MHz RF energy is passed. The intense 
RF field causes collisions hetween the Ar atoms, generating 
a higii-energ^v plasma. The sample aerosol is instantaneously 
decomposed in the plasma (plasn^a temperature is m the 
order of 6^000 to 10, 00 OK) to form analyte atoms, which are 
simultaneously ionized. The ions produced are extracted 
from the plasum into the mass spectrometer region, which 



is held at high vacuiun (typically 10"^ Tbrr, 10"^ Pa), The 
vacuum is maintained by differential pumping. 

The analyte ions are extracted through a pau^ of orifices, 
approximately 1 nun in diameter, known as the samplmg 
cone and the skirrvm^r cone. The analyte ions are then 
focused by a series of ion lenses into a quadnipole mass 
analyzer which separates the ions based on their mass/ 
charge ratio (m/z). Tlie term quadrupole is used because the 
mass analyzer is essentially four parallel molybdenum rods 
to wliich a combination of RF imd dc voltages is apphed. 
The combination of these voltages allows the analyzer to 
transmit only ions of a specific mass/charge ratio- Pinally, 
the ions are meiisureti using an electron multiplier, and data 
at all masses is collected by a counter. Tlie mass spectrum 
genei ated is extremely simple. Each elemental isotope ap- 
pears at a different mass (e.g, -^■^Al would appear at 27 iunu) 
with apeak intensity directly proportional to the initij^il con- 
centration of that isotope. The system also provides isotopic 
ratio information. 

New Benchtop ICP-MS 

The HP 4500 is tlie world s first benchtop ICP-MS (see Pig. 2). 
The reduction in instrun^ent size is dramatic: the size of the 
previous model is 1550 by 900 by 1450 tmn. while that of the 
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Fig. 1. HP 4500 ICP-MS scheniatic 
diagram, 
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Fig. 2. HP 4m\ ii'P-MS s>stf.^m 



HP 45U(I is 1 \m by 000 by 582 nun. Previous geiiemtioiis of 
ICP-MS iiistniments liad requircTuents — space, utilities, and 
euvironinenl^tluit tliciated that a spx^cial rcxiTu be (it*di- 
cated for tho inHinmrenL Installing? an KT-MS ctnjUl lie 
pajllciilarty diftH nlL sin<'e mj\j<jr constnietion rliaiij4es were 
often rH|ii!red- 

The HP 4501) \s snuiller iuid lighl€^r so that it can be installed 
on ati existing bencli. The layout of the instrmuent Ls de- 
signed Ui make user inlcTaction v\ith ilw smuple introfiiictinn 
system, the inreifaies, and die ion lenses routhie. All paiLs 
ran be accessed from I he front eukI eonneeii^i or tlisciin- 
nected easily. These anci other tiew features and technology 
introduced luid used by the IIP 1500 help to make ICP-MS a 
more r< inline and diereforc a more accessible lechnitiue. 

Ion Lens SysLeiii 

Tlie c on figi nation of tlie ion lens system is one of the key 
design isstu-s bei musi' il du'c-^ctly idTects I lie ion transmission 
efJlciency of an irP-MS system. Vanotis ion lens coiillgura- 
lioiLs were prothUMHl and evaluated to rletemiiiie (he aptijiinm 
v< 1 n fig! 1 ra t i o n an* 1 o 1 1 t^rat i n g < 'oiu i i I ions for t he 1 1 F 4-'jt )0 . 1 on 
Miijet inrics dnouj^li carli ion lens system were predicted 
iiiathematically. 

Tin* UP 4~>t)i) is eijuippcd wilh a new fyjie of r jfitics system, 
as shown tti Fig. la. Tlic tyuuya lens consists of a pair ol' 
crescent -shaped lenses that resenible the (ireek letter Q. 



The optics sy.stem eoutains tw o omega leiises, ilie omega -i- 
and omega- lenses, wliich bend the ion bean^, allowing the 
t|uadnipc?h* mid detector to lie mounted off-iixis, TliLs pre- 
vents t>ho!oas from reaehuig the defector (which would 
increase rai\<loni Inickground noise), anrl also focuses the 
icjns very efficiejitly. Tlie result is a very low random back- 
ground and high sc*nsiti\ity, making ultratrace analysis ilown 
lu die subnanogram-ptT-iiter level feasiUe. hi contrast, other 
K.'P-MS systeius em]jloy a ptioton stop lens system ;is shown 
in Fig. 31), ' lojis iire defocused after extraction into tJie nxain 
vatninni chamber ajid then refocnisecl wldle photons at^e 
Ithx'ked by I be phot on slojx Willi this design, some ions in- 
e\ iiably collide wiib the phoion sttj[> and aiT lost, so overall 
t ransmission is reduced. 

A.n c\xaniple of ion h^jettoiy mapping for the oj)lics system 
of Fig. ^ia is shown iti Fig. 4. In tilts exmnjile, tlie itiidal ion 
energ.V was estimated at 10 eV aod the .stxace-charge effect' 
was ignored llii* broad I race in the center sh(>vvs the ion 
UHieclories for ibe lens voltage settings shown. Stalling fnan 
I he tell, the lenses and tlieir voltages m'e: skimmer cone (no 
vi>hage), extraction lens 1 (-IfiOV), extraction lens 2 (-70V), 
eiiizel k'tis 1 (-10OV). *ntm'l lens 2 (WX einzel lens 3 (-i^Mni, 
oniega bias lens f--lriV'), <niiega-H k^ns (4V). omega- lens 
(-fiV), (juailrnpole fot nsand iilatc* bias h^isi^s (-HH). The 
eiozel lenses area traditional electrostatic lens system in 
winch the voltage on (he (center lens is diOerenl from the 
voltage on th(* othcTiwt) lensi^s. 
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Fig* 4, Exynij;k' oriun irajeciory aiiiiHUui^, 

Dual- Mode Detection System 

Tile dyiuiiiiir range of the ICP-MS system is extended from 
six to eiglU orders of magnitude in the HP 4500 by a newly 
developed dual-inodc detection system. Tlie eleciron multi- 
plier ufteci in the dual-mode system is a discrete dyno<ie type 
operated in botli pulse count mid mialog modes. 

The block diagram of the dual-mode system Is shown in 
Fig. 5. When mi ion enters the electron multiplier, il hits the 
th'sl dynotle and a shower of electrons Ls gei aerated. These 
electrtjns liit the next dyiiode, generating more electrons. 
Finally, t!ie pulse f^eneratcd is detected by the collector This 
small sigrnd is amp I i Tied and a measurable pulse signal is 
obtained. At thispoinl, I he output signal from the muphfier 
contains both electrical noise and the pidse signal. After the 
amphfier, the electrical noise is eliminated by a discriminator 
circuit and pulse signals higher thmi the dis<*rimiruit or voltage 
aie conveited to mi ideal pulse shape. TliLs pulse is measured 
as one count 

At ver^^ high anaiyte concent rat ions f> I m^ in the sample 
solution), detector saturation occurs, .so I he dual-mode sys- 
tem is automatically switched to analog mode and the ion 



current is measured. The ion current b converted to a fre- 
quency by a voltage-ttj-rrequency converter and meastued as 
counts per second. 

The dual-moc^e detector system extends the maximum work- 
ing range of the instrument u[> to approximately 100 u\g/\. 
The appnjpriate mode for eacli Isotope is selected automati- 
cally by the HP C hem Stat ion operating software, and rhial- 
niode data is acquired simultaneously, whicii is another first 
for ICP-MS. The great benefit is that samples containing a 
range of analytes at different concentration levels can be 
analyzed in a single analysis. 

Without duaJ-tnode operation, dilution^ preconcentratiou, or 
other com|>lif*ated samjilt^ pre|)aration and steps would be 
involved. It is inevital)l€ that as the process for sample prep- 
aration gets more con)plex, an increasing number of errors 
and eontaminaiion will occur t'orUami nation during sample 
preparatitm is always of concern when analyzing elements 
al trace levels. 

The ShieldToreh System 

Although I be IC"1*-MS generates essentially monatoniic, p^0? 
tively charged anaiyte ions, I here are still several jiolyatdittid' 
ions such as Ar( ), ArC', and ArH, whicli arise mainly from the 
combination of the ai'gon gas used to generate the plasma 
with oxygen, carbon, and hydrogen from the air and the 
samples. The main interferences are sho\^m m Table 1. 





Table 1 




Typical 


Interferences In 


ICP-MS 


Anaiyte 


m/z 


Interferant 


K 


39 


^^^^Ar^H 


("a 


-40 


m^ 


Ca 


#' 


I2ci60g 


Cr 


iS; 


4()^l-C 


Fe 


56 


4t)^h>0 



Tiie HP 4500 can be equipped with HP's proprietary technol- 
ogy called the ShieldToreh system > which reduces interfer- 
ence from polyatomic ions. '^ The electrical model of the 
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Fig, 5. Block Llitigrimi of the HP 4rj()n diial-mode rletcctlcju system. 
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plasma and the interface re^on is shovTi in Fig. 6. When the 
piasnia is coupled with the RF coil inductively, the plasma 
has only a ?>lighi i\c ijolenlial, Howev er, there is capacitive 
roupling between Ihe plasma and the RF coil, which creates 
a |>ositive plasma potential oscillating at the radio frequency 
of the phLsma snun^e, 

Wiihin the plasma, positive ions and electi*ons exist, since 
the iJlasma temperature is Mgli (6,000 to 10XK)OK). The num- 
bers of pcfsitive ions and electrons are essentially equal, so 
tlie (>]asma is elect ri(;ally neutral. Since Ihe sampling cone is 
coolefl by vvatei; ihe plasma temperature decreases ilranuiti- 
c ally when ihe plasma conies close to tlu* c<jne. Positive ions 
and elec^trons do not exist any more and the neutral Ar atom 
hetxjmes flominant. creating a ^sheaflf between the interface 
imd the plasnuL Since the t>lasrua potenllai is grounded to the 
inteifarr rUid the vacuum chaml>er through the sliealli, it atis 
as a rondeuser and the chcirge ImiUhip aromul ihe sampling 
cone results in the fomiation of a disc*harge inside the first 
vactmm stage, t^onmionly called the sprtttidarii dlHrhurge, 
The secondary {hscharge ionizes molecules such as ArO, 
i\rlL iuid ArAr inside* the first vacuum stage, giving rise to 
interference!^ with analvfe iot^s at the samt* nominal mass. 



WTxen the ShieldTorch srystem is used, a shield pfate is m- 
serted between the torch ai^d the RF coil, eliminating tlie 
capacitive coupling between the plasma and the RF coil 
so that the plasma potential is effectively reduced to zero. 
As a result, then^ is no longer a secondarj^' discharj^e and 
polyatomic ions aie not ionized behuid the sampling cone. 
To reduce the polyatomic ions even further, the plasma 
temperature is red need, since these polyatomic ions are 
also generated in the plasma itseit. Hy io%vering tlie idasma 
lemperatiire. the ShiehJTorch system ret Jutes these iuier- 
ferences dnmialically resultiiigin improved detection hmits 
down to ng/1 or ppt levels for elements such as Fe, Ca, and 
K — t>pically three orders of magnitude better than without 
the ShieldTorch system. T^'pical spectra with and witiiout 
ttie ShieldTorch system are shown in Fig. 7, 

HP C hem Stall on Operating Software 

The 1 1 P ( ' h e 1 1 1 S? al i o 1 1 op t ■ I « 1 1 i ri g .so ft w are i s e asy 1 1 j I earn an d 
use. All inslruiueol [laranulers iir** contrtjlled via the HP 
ChemStatJon. unlike iradHional IC'P-MS systems which were 
eonit>letely niajuial before the introduction of the HP 4500. 
An exatnple screen from the HPCheinStatlon is shown 
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Fi^. 8. Kxaiuple screen from the 

HT iriiiorhiMuSfttiion. 



in F\g. H — tlds is the instrument control screen. A single click 
of I lie mouse starts ttie entire system, while system status is 
dij^l>]aye<l in real lime. 

The HP CheniStation automates clay-ia-clay operation by 
employing a suite of autoluning routines. Autotuning aiitO' 
malically optimizes the sensitivity, baekgrounrl level, and 
mass resolution 4uit3 pt^rfoiTus jnass ciilihnition. In the timing 
sereeu. the iist^r can seleci the* tmiing at^tions to be jjerfonnerl 
an<i the Uiigel vahii\s tor sensitivity, oxide mid doubly charge<l 
ions, and back^j ound. Tltree masses (topically one each a1 
low. middle, arKi higli mass) are sinndumeously a<tjusted 
using aproprietao aigonUnn basetl on tlie simplex melhod,^ 
Each ion lens voltage is changed to increase dte signal of the 
element that has the weakest relative response (ral lo ot' ae- 
tual signal to target value) among the tliree masses until all 
the si glials satisfy the target vtilnes. This allows less experi- 
eiued operalcn^ i<j (jperate the uistnmient to its full potential. 

Applications 

'Hie IIP 1500 ICP-MS offere high-tliroughi)UT muitieiement 
analysis with ngd [ ppt) or better detection limits, vei^^ small 
sample volume re(.|uirements, robustness, and ease ot itse. 
TlieiTfore, the application areas for the HP 4500 are very 
w ide, from the semiconductor industii^' m whiclt the concen- 
traf itm of analytes is extremely low. to the en\ironmental. 
geoltjgical, and clinical fields hi which higli-matrix or "fhrty" 
sain]Dles are analy/.ed. 

Semiconductor Sample Ana lysis. The taend tow^ai-ds pattern 
miniaturb^ation atul ultra large-scale integration (ULSI) m 
&emicond[irtor devices requires ilie loweiing of the level of 
metallic impmities present, hi recognition of the need fm 
Mgher-purity chemicals to meet the needs of snl>inicrometer 
device production, the SEMI Process Chemicals Connniltee 
has proposed several gi^ades for each cheniical. 



Hydiogen peroxide, UjO^, is widely used to remove metallic, 
organic, and particulate contaminants from wafer surfaces 
during the seniicondut tor man u fart lu-ing process. The H^Oo 
must b(* of extremely high purity to avoid contanrination of 
the wafer surface by the cleaning sfilution itsell. 'fhe specitl- 
cation UjillAh (''iiU^2%) hi the SEMlTier C (nhdelines (the 
ciuality needed to produce 1( -s whose crilit al dhnensiuns lie 
in live range of 0,09 to iVl [\n\ or grealer) sdinilates ihat the 
maxiitvuin concentration of impuriiies shoidd be 100 ng/l 
(pl>t) for a suite of 18 metals. Table II shows the results of 
a 1 1 1 1 iit H i t at i v e p u r it y ;mal ys i s o f H ^h (^^(P/} ) . 

Until now. rer^oveiy da!a present eti u> [lie Process Chemicals 
C/o linn it tee l>y nieiii her {companies has involvtHl the use of 
[t P-MS followed by graphite furnace atomic absorption 
spectrosrtipy (CtFAAS) for Ca and Fe. The HP 4500 with tiie 
SliieldTorch system can determhte even Fe, K, and Ca a! low 
ptJt levels not normally possilile by quadrnptde K P-MS 
because of interferences horn polyatonric itms and is(]l>ars 
such iis j\il), j\rll and j\r. 

Table II also show^s tiie recovery results at the 50 ng/l (j)pt ) 
level Tlte recoveiles of all of the elements were well wit.hUi 
BEMl Tier C Guidelines, which sti|ailate that recovery data 
nmst he obiainetl showing 7-^ to 12ryjirrecoveries tor aU 
metals. 

Environmental Sample Analysis. Concerns regarding safe 
levels of contain in; nils in ihe environment, particidariy 
heav^ metals, continue to grow. The requirement for analy- 
sis of more elements at ever-decreasing concentrations is 
exposing the hmi tat ions of currently used analytical tech- 
rnques. ICF-MIS is Ihe only technitiue that offei's the innirove- 
menls in sensitivity that \^ill be demiuided in the nem- futiux!. 
ICP-MS is appmved for several environmental analvtical 
methods including those developed by die U.S. En\ironmen' 
tal Protection Agency (EPA). 
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Table II 
Quantitatrve Resufts for Hydrogen Peroxfde (30%) 





Concentration 


Detection 


Recovefy 


ement 


(ng/l) 


Limit {n^M 


(^aj 


B 


188 


4 


m 


Na 


47 


0.5 


102 


Mg 


8 


2 


97 


AJ 


9 


3 


102 


K 


not detected 


0.02 


LOl 


t'a 


34 


4 


109 


Ti 


14 


2 


9H 


Cr 


2 


1 


101 


Mn 


1.3 


0.1 


102 


Fe 


9 


1 


106 


Ni 


6.5 


0.6 


98 


(^u 


2.6 


0.4 


102 


Zti 


8 


1 


101 


As 


12 


0.7 


tl5 


So 


4,4 


0.5 


102 


Sb 


ai 


0.5 


104 


An 


t 


2 


LOO 


Ph 


M 


0.3 


OS 



Fig. 9 demonstrates the qiialitai ive spermuii of river water 
staiidard refereiice niatc^rial (SLHS-3). A large number of 
eleitient.s, ranging from lillutiin (U) at low mass to uraniimi 
(U) ai iiigh mass ran be rleaily obsened, even though the 
total analysis time vva.s only 100 seeoiids. Table 111 shows 
HP 4500 IC'P-MS (|naittitative results, which are In excellent 
agn^emenl with the reitirieO %'ahies. Tlit^ diial-inctde deteclion 
system allows the user fo ijiituUitate the aiialyles from a tew 
tens of i^ (ppt ) ki the nig/I (ppm) level. 





Table m 






Quantitative Results for River Water 




Certified 


Measured 




Concentration 


Concentration 


Element 


(ug/!J 


(^g/l,N = 3) 


m 


0,005 ± 0.001 


0.0051 ±0.0(XM 


Ha 


2300 ±200 


2260 ±:KJ 


Mg 


1600 ±200 


1450 ±!0 


Ai 


31 ±3 


32.3 ±0.5 


K 


700 ±100 


700±:i0 


Ga 


6000 ±400 


5720 ±!0 


V 


0.3 ±0.02 


0.303 ±0.004 


Cr 


0.3 ±0.04 


0.303 ±0.00:5 


Mn 


a9±0.3 


3.70 ±0.07 


Fe 


100 ±2 


98.7 ±0.5 


Co 


0.027 ±0.003 


0,0288 ±0.0002 


m 


0.83 ±0.08 


0.760 ±0.003 


Cu 


1.35 ±0.07 


1.39 ±0.02 


Zsi 


1.04±0.0i:» 


LOliOXtf) 


M 


0.72 ±0.05 


0.697 ±0.007 


Sr 


28. 1^^= 


30.1 ±0.2 


Mo 


0.19 ±0.01 


0.193 ±0.005 


Cd 


0,013 ±0.002 


0.0125 ±0.0002 


Sb 


0J2±0.0l 


0.127 ±0.001 


Ba 


13.4 ±0,5 


13.3 ±0.1 


Pb 


0.068 ±0.007 


0.060 ± 0.003 


U 


0.045^^ 


0.04 13 ±0.0008 



• Not cartidEjd. m^jiiiuiuu,! v^iuir uirlv 
N IS the fnimber of repetifions 
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Clinical Sample Analysis. The determination of toxic dements 
such as DiercunK' (Ilg), \^-^d (Ph) aiid cadmiuni (Cd) in hu- 
mans ha*3 beeri a criticaJ issue in the field of clinical chemis- 
try- from the toxicology \1ewpoint. In addition, since recent 
hiometiica] research has shown that stimc elements at trace 
levels liave specific functions in tlie biochemistO' of living 
organisms, ihe deteniiination of trace elemenl concentra- 
tions in human beuigs lias also become a m^or issue in the 
field of nutritional study. As a result, the analysis of toxic 
elements and also many trace elements in biological sam- 
ples is required. The anaiyte concentration range is large^ 
ranging fi-om tiie trace levels normally found in the body to 
the high levels resulting from industrial exposure. Since 
medical treatment regimes for hospital patients depend on 
tlie analytical results reported, the analysis of biomedical 
samples is critical. Therefore, the need for fast and reliable 
analytical methods and instnunentation is paramount. 

Table JV shouts the HP 4500 ICP-MS quantitaUve results for 
human hair standard reference material (NIES No* 5) wfiich 
was decomposed by a microwave sample preparation system. 
The concentrations of 11 elements analyzed were in good 
agreement for aU the elements that had certified values 
(there is no certified value for As). 



HP ChemStation 





Table IV 






Quantitative Results for Human Hair 






Certified 


Measured 


Detection 




Concentration 


Concentration 


limit 


Element 


lug/g) 


(^g/g) 


(^g/g) 


Al 


240^^ 


220 ±6 


0.003 


Cr 


1.4 ±0.2 


1.72 ±0.07 


0,004 


Ma 


5.2 ±0.3 


5.47 ±0.13 


aooi 


Fe 


225 ±9 


219 ±5 


0.9 


m 


hSiO.l 


1.87 ±0.06 


0,004 


Cu 


10.3 ±L2 


16.7 ±0.6 


0.002 


2& 


169 ±10 


171±4 


0.004 


^ 


urn 


0.18 ±0.02 


0.02 


m 


1.4* 


2.4 ±0.3 


0.004 


m 


0.2 ±0.03 


0.21 ± 0.03 


0,0002 


m 


4.4 ±0.4 


4.52 ±0.15 


0.003 


m^ 


6,0* 


5.98 ±0.11 


0.0007 



' Not ce rtif led, i nf ormation value only, 
** NoTcenified. 

Solid Sample Analysis. Solutions aBd liquids are the nonual 
sample t>i>es measured by ICP-MS. Solid samples are nor- 
mally digested using niiiieral acids mid analyzed as solutions. 
However, solid saiuples such as glass can be analyzed direct- 
ly using the laser ablation system. Tlie schematic diagram of 
this system is shown in Fig. 10. A sample Ls placed in the 
sample cell and ablated by the beam from a Nd:YAG laser 
operating at 266 nm. The fine aerosol generated is carried 
directly to the plasma i>y Ar carrier gas. Fig. 1 1 shows quah- 
tative data for glass standard reference material (N I ST 614). 
Group 1 and 2 elements, transition metaJs, rare earth ele- 
ments, and actinides can be clearly seen from a two-minute 




Sample Cell 
Fig, 10, 8cl(cmarir: dia^^raiTi of laser ablation system. 

analysis, even though the concentration of most elements 
was at tJie mg/kg (ppmj level or lower in the glass. 

In addition to tiie bulk analysis capability shown, this tech- 
nique also has the capability to analy^ze sample featiu^es and 
inclusions as small as 10 ^m in diameter. 

Speciation Analysis. Organotin compomids have been widely 
used for a variety of conui\ercial apph cations. TVialkyhln 
compoimds have been used for antifoiding paints For ships 
and fish traps. DiaLky Itin has been used for polymerizaliou 
catalysts. Currently, there is growing concern about theur 
effects on the environment.. Methods to detennine the 
species of tin (Sn) and the total arnomn of Sn present 
are required, since the toxicity of organotin compounds 
varies widely with the number and types of organic gioups 
attached to the Sn atom. The combination of ICP-MS and 
chromatography has the ability to perform speciation analy- 
sis with high selectivity and sensitivity. Fig. 12 shows a 
chromatogram of six organotin compounds obtained by the 
HP 4500 ICP-MS combined with the IIP 1050 Uqiiid cliroma- 
tograph. Each organotin compoimd was separated clearly 
within a total run time of 20 nunutes. Detection limits 
obtained were 24 to 51 pg as Sn, 

Summary 

The ?1F 4500 ICP-MS offers high sensitivity, low background, 
a wide dynamic range, aJid the reduction of polyatomic ions, 
even tliough its benchtop size is only one fifth the size of the 
previous model It is designed for routine ase, easy operation, 
and easy maintenance. With these features, tlie HP 4500 is 
ideal for a wide range of applications in the semiconductor 
industry, enwonmental studies, laboratory research, plant 
quality conti'oij and other areas. 
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Audit History and Time-Slice 
Archiving in an Object DBMS for 
Laboratory Databases 



Devefopment of an object database management system allows rapid, 
convenient access to large historical data archives generated from 
complex databases. 

by Timothy R Loomis 



'I'he reqiiireitieiil.y lur lab(jral<jJ7 tlalaba-sey itit lutie Ji uuiy 
of tile ijanie Teat ures specified for (jther tyijes of databases, 
including enr<>rt'pnieul ofji ng(jroni> tranHacliiai niodeK sui>- 
poii forconciineni iist*r.s, disdibuEed rccovciy capabililii?^^, 
|>errontiarK'c, ami ^securily Hcjwcner, t lie requirenients (iirfer 
tVom niosi cialabiises by the emphasis on sa\ing a conrpiele 
and recoverable ret^ord <jf historical data lor some types of 
daia. This recjtiirenien! ( omes hnni tlie re;JnIafr>ry uvei'seeing 
aiithorily tjftiie phannaceiilit jd indusuy hy nrgani^arions 
siich as the U.S. Go\Tniment"s Food aiid Dnig Administra- 
tion or Flnvironmcntal Protc^ctioii Agenf^y. and nften, t!ie 
legal importance of the data ( [>a1ent law). Some examples 
of hisloiieal data in a chemical labcjiatoiy inriiidt^ prt^vioas 
values of lest results, designated re\1ewers ^md approvers 
of data^ methods of analysis, and ingiedjcnts nsed to pro- 
duce a product. It is nee essary \u be able to determine when 
litis dala clianged. who changed lu ciiul why a chaiige wiis 
necessary 

MosI lal)(>raloiy database systems have tried to dea! with 
historical data by adding complex logic lo the applic alifin 
code to record and retrieve historical data in special tables 
that are adfierl to traditional relational database schemas, 
Uliile this technique works for simple sc hern as witli a few 
objects that need to be monitored for ehat^ge, its eomplexity 
overwhelms de\ eJopmont, testing, and supjjoit effoils for 
more realistic databases. In short, it does not scale to the 
complex databases needed for the future. 

Keeping tra<:k of historical data became a critical design 
factor when the IIP (. lunnStndy tHrutuct was being develo]:ied 
in the laborator>^ infonaation management system ]:(rog]am 
hi HP's Chemical Analysis Solutions Divisitjn. HP Chem- 
Study controls all the information used m maltiyear pmjects 
thai deiermitie the expiration dates on dnigs. The database 
is ct^nqdex with 12H types of a;jplieation objects inlercon- 
nected ihrougli tnimertjus relatitinships. It is necessary to 
be able to reproduce tlie contents of objects and the stale of 
their relationsliips at any time in tJie past to satisfy regulatory 
requirements. 

Our solurion to the hist.oriCiiJ data cltallenges of laboratory* 
databases has been to develop a database nianagement sys- 
tem [DBMS) that ]3ro\ides buih-in suppoit tor liistorical data 
for any object and foi" groups f >f objects tlutt are conne(iet I 
through relationships. The simplicity and exiensibihty of ttiis 



Glossary 

Commit. The database ope ret ion that makes changes by a user perma- 
nent in the database and visible to other users. 

Component Object. An object that is comamed logically by an^th^t 

(cnnifjrj^iitBJ ubjecl. 

Composite Object An objgclthat foglcalJy contains other Icomponent) 
obtects 

Exclusive Lock A database mark placed on an object on behalf of 
a user la prohibit another user from obtaining a lock pr modifying the 
object. 

Foreign Key A way of idemifying data from a row in one table that is 
dupiicaiud If! a row in another table to logically relate the two rows. 

Pessimistic Concurrency The model of database design and prngram- 
ming that obtains exclusive locks on dala to be updated to ensure that 
the commiE operation will not identify conflicts with other users and fail. 
The other end-member modei, optimistic concurrency, avoids obtaining 
locks but risks a commit failure- 
Rollback. Tlte database operation that discards changes by a user, 
returning the database to its state before the transaction began. 

Save Point. During the process of modifying objects before commit, a 
user can mark a save point and later roll back to this slate rather than to 
the beginning of the transacttan, discarding later changes to objects, 
Save paints are removed at commit ur rollback. 

Schema, A description of the tables, tfie data within fables> and the 

logical relationships among data for a relational database. This is often 
e)!tended to any descnption of database data and relationships in gen- 
era f 

Logical Transaction. A collection of database modifications that 
shouki be implemented completely or net at all. 

Two 'Phase Commit Protocol A method of commit that tries to verify 
that all databases pariicipating in a logical transaction can implement 

their part of the transaction before implementing the changes in any 
database This is used to avoid only partially implementing a logical 
transaction in a distributed database system. 
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sysl€»ni are possible because* we have developed a pujre ob- 
jeel DBMS (ODBMSj in which relationships are I hem selves 
objects. Aithotigl^ ilie ODBMS provides many adsajitages for 
applications develoi>nienl, this article will concent mte on 
die lisiie of historical daia. 



The ODBMS m implemented in C 
sj^eni and Windows ' NT. 



■* nil tlie BP-T*X tipeniting 



System Overview 

Before considering the deuills of how historical data is n^aii- 
aged in the database, we netMi aii os'eniew of the distrihiited 
ODB^IS Hi undei"stand how an oljiject is created and stored. 
Wltile tills mothikir system can be cfinngiirecl in niaiiTi' ways. 
Fig, 1 presents? an cximiple configuration that is used m the 
HP ChemSmdy product, 

Li Fig. L a client is a |>RK-ess 1 liat incc^rporates C-^+ class 
code tliat deiines appUcation objects. WliiJe the object 
created by tlie application can contain miy data Jieeded in die 
apj^ilicalion, the obje<:^r is nuniagetl (I*)cked. updated, saved) 
through the services of the generic object numager module. 
The object manager also controls logical transactions (com- 
mit and rt^llback) and pro\1des save |>oint.s <mti otlier DBMS 
fiuu ticjii^:;. A\ I he objet t manager level, all objects ai\' treated 
alike ajid no changes are ri*i|uireil io sup[)or( any tiew apjjli- 
cation object t>pes. Tlie cUeiit may lia\'e a u^ser interface 
(shown as a gi^aphk-al user interface (Gil) in Fig. 1) or it 
could be an application sender with code to support ifi* ov^ti 
cUents. 

The object manager cati cotinect tt Mine or more object 
seivers Ihat cnittrol a dataliase. Tlie alMlily iu (*uiinect to 
multiple ol)j(H't sen ers makes the sysient a di.stributt»d 
DBMS anti necessitates a two-phase conmiit protocol to 
enstire that u transaction afft'ctin^ multiple tlat abases works 
c(jrn'etly Tlie distribntcd capabiliiics of the ( JlJf^MS are 
empk)yetl iav arrhi\ing opt* rat ions [described below) and 
for hitegrating data from mtilti(ilc active databases. 



Currently, we provide I wo t>i>et> of object servers wliich 

differ only in tlie driver t^mie rnt>duie diat stores object data. 
From the [X)iTit of view of a chent process, there is no tlilTer- 
ence in tlie wav" an otjtject is treateiL Tlie fJracle object server 
stores an object m Oracle tables while the file object sen'er 
stores the object in one or more reiJonrlani Tile structures as 
object datii, Uliile the fik* version is faster than (_)racl(* for 
ivad and write by a factor of -JO tolCWX some castomers preJer 
the Omcle version because it confonns to their corpomte 
information s>^tenis requirements. The file version also 
stores data more compactly ;ytd is ideal for en!t>edded data- 
bases that are not visible Io users and for tlatabases in 
winch the speed of storing and retrieving data is critical. 
Because tlie object data stored by either type of serv er is 
binar>\ nndtimedia data or a binarj- file can be stored i)y 
breaking the <lata inio objt^cts. Objects are alst> useful for 
])ro(essing a hirge biiiar>' data file in elients that do not liave 
enougli menioiy to bold all die data at once. 

Laboratoiy databast»s become so large thai it is necessary lo 
remove old data periodically Irom the artive database and 
place it in some tM>c oTfirrfntx' for long-term storage. Most 
systems ha\ e used a s].Teciai storage medium for mcliiv t^d 
dam and reijuiie that I In* data be deardt i vpil back to the 
active database for review histeaci we use tlie dLstributed 
capabilities of the ODBMS to transfer data from the active 
database to an arcliive databast^ ns a simple distributed data- 
base transaction. The ajclvKe dalabitse can then be taken 
offline wiihont liniitingcurreni operatitms. Fig, 1 shows mi 
Oracle seiner bemg used for the active database and a file 
version being used for an archive fiatabase. 

The object database provides access for C++ object applitra- 
Hons but lacks fiicihties bir ad ho<" queries and reixjrls that 
cLin be custorniiied by a t^ustonier To acccuitmodatead hue 
cjueries mid report wtiters, a coIJection of tnapped tables 
can be created that provide a more traditional relalj<mal 
database schema of the aptjlicaiiou data. Kach tyjjc of (J++ 
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object can be mapped to its own table in the map schema 
when it is inserted or updated, but ll is always read by the 
application from the object database. In practice, only some 
data in selected objects is mapped. This object-relational 
DBMS combination has proven to be very^ successful at pro- 
viding the customer with reporting Qexibility^ while preserv- 
ing the speed and simplicity of a pure object system for the 
application code. 

An example of mapping is shown in Fig. 2* The example 
considers three objects of tJiree different lyi>es; Dept, Emp 
and EmpUst (relationship). A cUeni connected to the object 
server transports binaiy objects to and from the server 
cache. Except for objects newly created by a cUent, all ob- 
jects in the caciie have persistenl counterparts in the object 
database and are read into the cache from this database. All 
objects are inserted or updated in the object database during 
the conuTiit operation. At the option of the appU cation de- 
signer, selected data from an object can also be mapped to 
the map database as shov^m for tlie Oept and Emp objects. Tlie 
EmpList relationship object is not mapped in this example. 
Relationships are usually defined using foi^eign keys in 
relational schemas. 

We can see from this overview that sa\ object is a bundle of 
data that can exist simultaneously as a C++ object in multiple 
clients, as an object in the cache in the object sender, as 
object data in a database, and as mapped data in a relational 
t^ble. Managing the relationships among these multiple 
representations of an object requires adherence to a rigor- 
ous transaction model Many of the featiu-es necessary to 
deal witii historical versions of an object are extensions Ijo 
controls that already exist for object data. 



Aodititig Laboratory Data 

There is more to a databa.se data item than a value that can 
be retrieved. For example, that value was created by some- 
one or some calculation, it may have been converted from 
a string representation with a specific precision, it was 
created at some date and time, it may have sonte a|ipUcation- 
spec tiled limits that caimot be exceeded, and so on. More- 
over, the current value may have replaced a previous value, 
requiring a jtistifying comment, and it may be necessary to 
retrieve all earlier values of this data item. It has long been 
a requirement for laboratory databases to maintain this type 
of information associated with a laboratory measurement 
and to record a history of changes to the measurement. 
We generally refer to the process of maintaining a record 
of a value iuid its associated information through time as 
auditing or maintaining an audit trail, tn the context of 
an object database, auditing means keeping a record of die 
history of an object and objects associated with it through 
relationships. 

Auditing database data has generally n^eant keeping a sepa- 
rate record or audit log of selected changes made to the 
database. For example, Oracle provides the capability to 
audit user, action, and date for access to selected object 
types but requires a user to write triggers to record changes 
to data values. While this straightforward mechanism does 
accomplish the task, its use for large and complex databases 
rapidly generates huge volumes of data that require sopiiisti- 
cated searching to identify particular changes of interest, 
A simple audit log of datatiase changes is practical only if one 
hopes tliat it will never be needed! Audit logs are routinely 
needed in the phannaceutical industry and will soon be a 
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( ommoi! reqtiiremeni far Either industrit^s subject to regiihi- 
tory oversight surli as soflware clevekjpiueni prucesses 
stibject to ISO vaOdatioii. Seart-hing through a huge amiit lug 
is not a reasonable way to ani?wer an auditors *|itPsrions 
atxiut the histtirv- of an objet t that may tontain, or lie assod- 
ate<i wii}%. hundreds of conipoiienl objects, 

Tlip aliemative to an exlt^nial aucht log Ls a DBMS that h;Ls 
an intrinsic* methtKl for auditing an objeci anil its rdat ion- 
ships. In the next secticHi we discuss general rnethtxls devel- 
optnl tfj audit selected elasses of composite olyecis *ityn*d in 
an t>E>BMS so that the auilit data can be retrieved easily. 

The subjeci of temporal d^itabases has received consider- 
able research attention directed niiiinly toward extendhig 
the relational moflel and providing time-baKed query 
mediodsJ^^ The intplenientation preseiued here differs 
from these models [jdncipally by: 

• Using an object model 

• losing relationsliij) objects together \\ith loek-and-update 
propagation to syju^hronize tht^ time histcny of related 
objects, rather than aut^niplin^ to tleal with the more 
general jj rob I em of "joining" ;my sc^t of ol)jects 

• Behig a working ini interne mat ion for audit -trail applications 
that deals with load errors and numerous practical program- 
ming jjroblems. 

t'onuncrcial extended n^latiomil databases siicii as Ilhistra^ 
aje beginning to [>ro\ide some lime-based cat)abililles fur 
specialized data 

Example Schema 

Auditmg aji objeci Is rumiilicateil by references (f> other 
objects. Consider Kig. ;1, wJiich shows ;ui alihreviated class 
schema for a division cjf a comt>any containing ilepailmenis, 
( lei >artn le ni o f 11 c -t^s, and e i n pi t jy ees w j 1 1 un ( lei.jar1 1 nen 1 s. 
Reialionsbjp classes (objt^cts) deriveil I'rom the chtss list are 
shown explicitly in this diagram because tliey are imjjoiiiml 
in audit uig. (For t laiity all lists are shown as separate 
classes rather thati cts inberiled base classt^). A reference to 
another object Is shown explicitly as an airow in this dia- 
gram because we will be concerned with M\e tletails of ]jroi>- 
agation r>f ijdorniatirui belween iibjecis. A line lerminaled 
with a dol [as in IDKFIX or (JMT modeling}^ indicates that 
references to midtiple objectii can lie stored. An A in the 
low er-right cornei' of a ebiss indicates that oyects hi Ibal 
t lass are auditi*d. 

Composite Objects Audited iclationships should be used to 
( r>nfriiii (he tujiniKJuents of a composi(r^ object. A coni[>osite 
object is one Ihat can be coiLsidered to logically contahi 
other componenl objects in the application. More precisely 
for r>Mr [jtirjHJses. a compcjsite r^bjfct c an be defined as one 
fliat should he marketi as changed (ujjdaied) if component 
{>bjec(s are added, deleted, or changed even if the data 
willtiM ibe com]josile objeci itsel! retnains unmodified. 

hi the exaitiple of Fig, Ji, we will consider a Dept to be a com- 
posite ijbject because it logically contains Emp t^omponent 
objeci s. .\n EmpList ohjet^t is (be relaiifJiisliip or container 
connecting the c omjiosite mul i(s compuiu nts. We consi(ier 
Dept U) be N cttmposite ohjet i in (his example because we 
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implicitly include all tfie employees in die departmeitt as part 
of the depaiiuuMU and wimi to consider the depailment to 
be modified if 1 1 fere are miy changes to imy of tiie employees. 
Altenialively, w^e could have considered Oept to exist iiide- 
pentlent of its et!ij)loyees. Cleaily we cmi sink into the dark 
watei>j of a long ]>liilosopiiical disciLssion beie (If you change 
the engine in (he car is i( (lie sana^ cm?), so the design is 
best a(jpri>arhetl physit ally, The bMsic question is whether 
(examination of the histoty of a composite objeci shouhl 
reflect tlianges to its eomjit>nen( i*f>jec Is. Foi- jnany comjjlex 
objects in our products (be answer is yes. 

1U'o references are nt^eessary for an audited relatioushipH 
Kelerences travei's(^<i fn^m Depi (o Emp are calleil iftrnponi'ttf 
reJrtvfUTH and (lie reverse ri^h^rences are railed hark 

Audited and Nonaudited Objects. As eKeniplifi^^l by the u.se of 
classes derived from the li.st i lass in audited atal nonauditefi 
relationslyt>s. autUtingean be specified ou a subclass or au 
individual object. Moreover, it is pennissihle (o nini anrliiing 
on only after some evetit in the life of an object. For the 
moimnib we consider only the case where an object in an 
audited class is audited from inception. 

We see in Fig. 'A (bat objc^cts of the con^posite Depi class 
should be audiled from creation but dial DeptOffice niui Divi- 
sion are never audited. Semantically. this destgu means that 
the hist (ay of a Dept object, including the composition of tdl 
I ^f its (*omponent Emps, can be retrieved at any stage of its 
histoty hi contiiist. the DeptOffice for fl^e Depi and (he lis! of 
DepLs in the Divtsion t an be retrievetl only for their ( tirrenl 
valiu*s. 
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Audit Merhanisiti 

Auditing Objects. Autliilngan object means thai all images 
of the <)l>]of'i must be inainiained in the database^ starting 
vtith the intagu ihai existed when auditing was turned on 
In contrast, only the latest (eunent) inrage of a nonaiidited 
object is retained. Note that when an aiidlted object is to be 
wTittert to the database, tlie decision to replace the old image 
depends on whether the old one was audited. Successive 
object images generated throngli ui:>dale will be refened to 
as revisiofh'^ of the object, whether the object is audited or 
not. The revision number Ls used by the Of)BMS to ensure 
that a client is working with the correct image of m\ object. 
There can be only one cmrent re\ision of an object and only 
the cvirrent revision can be updated 



Ofajeist 



TiniQ 



The term persioti is used for the concept of i 
variations of an object tliat can all be current. For example, 
different versions of a glossary can exist for different lan- 
guages but each version may imdergo re\ision lo add terms 
or correct en'ors. An object is also marked with a rofffmii 
trmesttimp, which is exactly the sajne hjr all objec^ls in a 
Q>ossii>ly distjibutedj transaction. These attributes of an 
object, along with its identifier and other data, are containeti 
in a hea(ier that is prepended to the object in the database 
and maintained separately by the C++ object in the client 
object manager. 

Auiliting Relationships. AntUting relationships requires some 
mecliaiiisiti lor recording the history of the relationship. 
Rather than implement a database relationship nieciiatilsm 
and audit ii se|>arately from auditing objects^ it makes sense 
to unplement relationships as ol>ject5 tbeniseives. Auditing a 
relationsiiip is then no different than auditing an object. 

Deleting Audited Objects. l)e!ei ing an object becomes compli- 
cated when the otjjeci is audited because the object still 
exists in the database until ttie delete is committed The 
delete aclion nuist be represented in the database soniehow; 
so that the times tamp and revision nmnber marking the end 
of its life are available. We use a pseudo-object for this pur- 
pose, Ai*cM%iJig authted objects, or portions of their history, 
may invoK^e actually referencing and loading these pseudo- 
objects representing tbe delete operation. 

Update Propagation. i\n important objective of the audit 
uiechanism should bo to update the mlmmum amount of 
infonnation to docimtent a change fully, Kor this reason 
we reject the sittiple "arcliive copy" approach to auditing 
whereby the entire composite object is copied each time 
a component changes. Tlius, we should not simiily make 
a copy of the entire Dept composile hierarchy just i>ecause 
an Emp changed because this produces a huge amount of 
redundant data 

Auditing a composite hierajx-by is implemented in our sys- 
tem by propagating tlie update of a component tlnough tlie 
relationship and composite paiTnt objects using back refer- 
ences. For example, updating member data in an Emp object 
will trigger an update in tbe Emp List and Dept but will not 
necessitate an update or copy of other Emps or of other 
components of Dept. It is necessary to mark composite 
objects as updated even tliough their member data lias nt>t 
changed because the composite tliey represent has changed. 
Note tJiat there is nothing to be gained by updating a non- 
audited object that references an audited one because it 
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does not have a history corresponding to the past histor>^ of 
the referenced object- Therefore, lor example^ Division is not 
updated when Dept changes. 

1% Ls impractical to ex[>ect progranuners to foUow these back 
references each time they update an object. It is iilso asking 
for hugs to expect them to qualify the propagation correctly 
according to audit state and update type. We have solved 
this problem i\v in coq) orating back references implicitly 
within relationship objects and component objects. The 
object manager code propagates updates automatically as 
approi>riate. 

The audit contents of a database can be illustrated using 
F\g. 4, an exiunple }nstor>' of a part of the example schema 
in Fig. 3, The number showm for each object at a particular 
time is its revision number, a simple count of the number of 
dataiiase fiansactions that have changed the object. We see 
that Division has not been changed since it was created. Dept- 
List was created at the same time as revision ] but t\as been 
modified tv^ice since tlien (when Deptl iutd tiien Dept2 were 
added J. Since DeptList is not audi ted , only the last revision 
(revision 3) exists m the database. 

The beha\ior of audited objects is different. Deptl and Its 
Em p Li St 1 were added to the DeptList as re\i,sions i. When Em pi 
was added to EmpListl, tlu* update was propagated to Deptl as 
well as EmpListl so that the re%ision of the composite object 
Deptl reflects a change to one of its components. Tlie same 
thing fiappens when Emp2 is added. N^ote that Empi is not up- 
dated in this operation, nor does tlie update propagate to the 
nonaudited DeptLtst. A subsequent update of Emp2 (revision 2) 
sunilarly causes propagateti updates to EmpListl and Deptl- To 
make the example Interesting, EmpZ has been deleteti, repre- 
sented by the creation of the pseudo-object with revision 
number 3D. This object really exists in the database as a 
marker of the end of tJie life of Enip2 (flgtiratlvely, we hope). 
Just as for an update, this delete operation causes an update 
of EmpListl and Deptl . 

Lock Propagation. For pessimistic concurrency models it is 
necessai^^ to acquire an exijiicit lock on all <:)bje( ts to be 
updated at commit. Consequently, the object manager 
siioiild propagate exclusive locks in tbe sa^iie way that it 
propagates updates and be able to deal with restoring locks 
to their original type if the propagation should fail partway 
through the propagation. 

Atidit Log. Another objective is to summarize changes to the 
composite Dept object in one place. In tliis example, suppose 
there lu^e se^erdl changes to each of three Emps and to some 
other components (not shov^Ti) in a single transaction. The 
update mechanism records the fact in the Dept object U\at 
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sciniethjiig chaiigi^ri m at least one coniiionent object in this 
tjaiisiU'don. Inu we iieetl tiie Atidrttug text obJ4x^f to itemizt* tlie 
s[K^*ifw changes Iniiulletl in ?1uit intiis^uiiuii- Fig. :] j^iiuws a 
Msi of AiiditLog ohjet't.s hanging frorn Dept- Kach Ayditlog ohjeit 
siiRimarizi^s llu* tjianges fur the eouii^Mjsiie Oepi object dyhng 
a tninsaction. From rhe user's point of view, a convenienT 
inifiiernentation is to generate one-line enldes in tlie log 
ant on rati tally for each change (lu^ ajjp heal ion niakt»s to a 
< oii*i>onenl object or ihe cotnjioriirc objtH I, iuul then rc<|ii(re 
the user lo adci otily a sunnnar\ cotimient before ttomniit. 

Object Access 

Bevtsiof) and Titne Retrieval. An andileti object c*an \w retrie^eii 
fnnn the dataljase by specifying either a specific re\iskHi of 
tlie object or by specifyhjg im absolute time anti fititling tlie 
objet*t that was cin'rcjit at that litne. A special litne lokctt 
represetils ciinH-nt time (also known ;ls NOW ijt tlie hrera- 
ture). corresponding tc> the most recer^i object revisi*ni. 
Accessing objects l)y absohite time requires tltat the conunir 
tirtiestaniji {>f an object be determined so tliat it corres]ir>nds 
c(jnectly to the ac^tiojis tjf miiltit.>lc ilients in a distributi'il 
datal>ase emiromncnt. A consistent source of time nmst be 
available to all clients and time must be specified precisely 
enough to distinguish two transactions on a fast netw^ork. 

Ajt example is the best way to explain why both access 
methods ai'e needed. A f*ninn\on way l<j ijuer>' Ibe dataijase 
histor>' in Fig. 4 would be to locate the cnn enl Deptl mid then 
iLsk U.> see each of its ]jre\ious revisions. Retrieving revision T) 
of Oepti, the system wouki use its commit timestamp to re- 
trieve revision 1 of Empl £inti not find Efnp2 because it was 
deleb^d at lliis (hue in EmpListl. Mrninj^ back in lime to revi- 
sion 4 of Deptl. JEs EmpListl would recover revisitHi 1 of Empl 
again ;uid also find revision 2 of Emp2, Inslead of startitig 
with the cmrent revision of Deptl, thc^ ijiilial tjiieiy Cf>uld have 
specitled any al)solnie time, Siiy one sonjewhere bel ween 
revisions 2 and :] of Dept I to 11 nd irvisioii 'A oTDeptl, then 
tile cfunmil limestamp of revision 'J. would lie usetl lo tint] 
componeni dala. 

Muhiple Revision Management A t cmsequence of auditing 
ohjecLs is lltaL i mil tittle jevisions of the same object cmi exist 
ii^ die clietit cache at the same time This presents a number 
of prni (icjd [jroblents lV>rappli* atioti t^t'>g^iMttmers who 
ju'etl a sim[ile tne<iiauisni for sju cifyiHg the r onecl objecl 
revision lo access. We have found Ihal extending Ihe mean- 
ing of ItM^king mi ohjecl tf> imlude cat^he manaj^ement fjf old 
anfi cnrrenl revisions of an obji^ci as well its ihc> iraditional 
rneaniiii:; of !:;rajtting aji exphctt lock oji tiie object is a jiracti- 
ral solution to tliis piof>lc[iL 

Accessing Objects thraugh References. Mbdng audited ^ind 
nonaudited objects in ihe same application exposes tlie im- 
plemenler to rnnnerons oijporttinities to generate mn-tiine 
dalai)iise loatl errors. Dt^spile the piolilems of a schetna with 
both aodili'^fl and notiauditt-fl olijects. it is often necessaiy 
to mix lite two to avoid creating impraclical (iuaiilities of 
datii in the tfatabase. A few referenting iitles, if they t^aii be 
enforced, solve the prol»lems. 

Kule 1: Current access to nonaudited objects, A nonaudited 
(jbject must always be accessed as a nnrettt-fh^w object, 
nicaning Ihe latest one available IVom Ihe database. F'or 
I'xatnple, all revisions of Dept use rmn^nt titne when access- 
ing DeptOftice becatist^ old jcvisiojis ot DeptOffice df> not exist. 



If an old time wen? spetiOtHl in the access request and Dept- 
Office hatl iiol bi^en changed, the acct*ss would suc*(*eetl, but 
a few miiuites hiter. after OeptOffke hatl been tipciated by 
another client and lis timestanip had change*!, the same 
requt^l would fail! 

Tins mlt* is sintple enougli but does inmxhice some opportu- 
nities for appimenily inconsist en I beha\ior. For example, if a 
rejxjrt generated for a Dept uses the leference to OeptOfffcce to 
include its rootn ituuiber, the same repoH repealed later on 
the same revision of the Oept ctntUl have aiuiUier nH>in tmtn- 
l*er if DeptOffice had beert chmigeif Worst", the DeptOftice could 
liave been deleteti from the Division eansing a load error. 
These ajvptUent proi>letns are not the fault of the daTal><ise 
system but rather Jturinsic in the heien^geneons schema 
They are solved either iiy aoditing DeptOffice or by indicating 
that DeptOffice is deleted by status data w iihin the tjbject 
rather than deletutg the oli^ject, 

I Rule 2: Qualified access from nonaudited to auriiteti ol)jects. 
As explauied above, an access time or specific revisioiv ntan- 
ber inusi be specified vvhen a<'cessingan audited object* For 
examt>le. li\e Division cati relVreiu t- a Dept in !br(?e ways; !>y 
specific revision, by ciinent time (ineauiiiK the latest revi- 
sion), or by al)soluie time, hi jjmct ice^ a nser tloes not gener- 
ally kntjw a spetitic revision tjf the Dept object or a si^ecitlc 
cotmnit titneslami^. Therefore the ntost usebU access times 
me cunent lime or an absohite time the usei' specifies for 
some reason. 

A continuing complication when accessing audiled objects 
is that the object exists at some times but tioi others. For 
examt^le. if we delete tlte Dept wlieu it is tnuLsfened oul of the 
Divisjon. we Ccin't simply <le|ete il from the DeptUsl because we 
may need I o access the old Dept iiifonnation in the future. 
TliiLs, the reference to a Dept should be tested for accessibility 
liefore we tiy lo load it for a specific time to avoid a load 
error. Tliese probletns are solved if we simt>ly audit the Dept- 
bsi attd Division. 

► Rule iJ: Self-ttmestatnp access between auciifeil objects, 'f be 
easy ami fool])roof way for an audited objecl to nccc^ss 
anoilter aticiited objecl is toril to use its owji cfunmil time- 
stiiju|>, Furllieniiore, it is pennissit)[e forim auflited object 
to drop a refiMence when the ol)]ecl is dekMed (or for ;my 
otlu^r rejLsoni be<ause its previous revisions will still have 
tiie reference, llow^even there aie some complications. 

If may be necessaty lor an rjbjei't Itj access the same objecl 
irt differetn ways. Suppose the DeptOffice in Vig. 3 weie au- 
ditt^l If we creaie a report on a re^visltJO of Dept iitid inclutie 
DeptOffice tnformatif in. the method In Dept creating the reporl 
should nse ils liinestainp acf ess to DeptOffice lo gel {Diifeni- 
])oianeotis hvronnation. However, ifa Dept nielhcid is pro- 
granuned U) iitKiate tlu* OeptOffiGe. say wiih its iderttiflcatitju 
it^formaiion, h is imporlatii thai the ciuTent DeptOffice be 
accessetf because only a cnrretn (object can be iipdateci. 
As hmg as tiie Dept is nptlated first, liniesfain[> aci ess c;m 
t>e used for both btit it will not work if the u])flale in Dept 
is mmked after accessing DeptOfftce. In general, it is sat>r to 
rode current access explicit ly when ujKlating a reieren<ed 
object. 

Midlife Changes of an Object. !i is pc^rmissible to ( bange an 
objet I iinui nntiaudi[('tl In audiled ai sonie unie in its lih\ 
Probably tin- most eonnnon reason lo do I Ins is to avoid 
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j^ciHTMtiTig largp iinunints of iiatiJ while an ohjecl is in sr*nie 
cintll stage aud btnug nialatfd frtHjiietitly, Keet> in niimi that 
the object can be a composite object hierarcliy enconipaKS' 
ing htitKiretb oi lar^r objfH'Is. OuW ai'HT some ajt(>rova] 
siiige ciues tlie apjjlicalion really want to track the lil'e of this 
c*omposite construct. 

Making an object atidited may change t!ie rules it uses to 
acecHS ctjutpoiient olijecl-s an<l jaojiagale updates. By itn- 
plenieiiting tliese tueclianisms in object jnanager utilities, 
till" chMEige t ail b(^ made trans].uiren( to most applit ritirHi 
develo[vers. 

Schema Constraints 

The previous diseussion leads to a simtjle nile foj' auditing 
classes in a sehenia: audit the ronip< incuts ancl ri^lationships 
irU^e composite is audited. For a composite oljject to truly 
represent the state oi a component liieraichy, all the compo- 
nents and eomponenf-ronijxjsitt^ relatiujiships beneath tlie 
coiTiposite nmsl be audited when the composite is auciited. 
Only then will locks aufl updates t>e piotjagaied conecfly 
anci <an the composhe use its timesl«mi|) lo a<'ces!^ iia ium- 
ponenls reliably. 

For example, Hg. -I shows the AudiiLog as audited even though 
wu* exped to rr<^nl<^' only a single AuditLog revision loreacli 
iraiisaelion. Marking it audited to) lows the rule to acquire 
the programming shnplifuations enumerated at>ove. There 
is really no penalty m this ease, because sforjJig one revision 
of an audited (jhject fakes no more rootn tJian sioriivg oive 
revisitai (da nonauiiked one. 

Theie are reascjus hir breaking this rule. In hirge realist ie 
sysienis (in contrast lo small demonstration ones) wt* face 
rejilistic const niinis on .space mid t>rten somewhat junhigiious 
a|)plication re<|iihentenls. As an exantplet consider DeptOtfice 
which is niciiketl as nonautllted in Fig. -1 If we assume that 
ttiere are good ap]>lJcation reasons for not auditing DeptOtfice, 
we have to rart^fnlly ac-cess the references beiw c*i'n Dept and 
DeptOffice ac cc*rding to the complications discussed al>ove 
audaccei)! theapjtarent incojisistenc ies that these relation- 
shi]js may prtitlnce. 

Database Storage 

Objett storage implementations are beyond the scope of 
this article, hul it is w^orthwhik* lo mention a coutile of con- 
siderations. First, it is not necessiny to lia\'e a s]jec ialiiiccl 
database to stole andiled objects We have implemented an 
audiling database that can use* either Orach^ tables or our 
own file storage ntanager. The main eomplii^a lions iu*e: 

• Prt>vidingan elTu ien( net-ess t net hod Nun will find an ohjeet 
c'unent at a time lha( does not uet'essaiiiy e<jrtesiJond lo a 
timc*stamp 

* lianciling pseudo-objects representing delete. 

Set^ond, ii is advisable 10 pnjvide eftlcient access to cnnent 
objects. Bec*ause aiulited ohjetts ine never delele*! 11 is not 
tnuTiisonable to exjjcnt himcheds of ( cjpies ofim objet 1 in an 
old databascv. Most at^i^li cat ions will primtuiiy access the cur- 
rent revision of mi object and have to stinnble over all the 
old revisions nnh'^ss the* stonige manager distinguishes cur- 
rent and old audited data. It may lie wonh introituriug some 
overhead to mo\c^ the old revision of an oljject wlu 11 a new^ 
revision appears 10 maintaiii leasonable act^ess elTit iency. 



Some object database systems map cjl>jecM data lo relaiitmal 
lahles. The lelalionaJ system <an represent the primary 
objeei clet>osirt>!7 or. alternailvely, only seleeieci data cati 
l)e matija^d to enable < nsloineiis to use the iul hnc qnery 
and rejjort-writing f apubiJilies (jf ihe iekiti<jual database 
system. Extending these systems lo hiindle audited data 
simply re(iuin*s adding a revision manl>er. timestamp, and 
objecl status eofle it) Ihe mat>]ied data. The ad hoe user 
slu^uld be able 10 tnrmnlale the same type of itn isi(m and 
time (U^penth'ut qucric^s of ihe relational thitahase as a pro- 
gramming language does of the objeei ttataltase. Ttie status 
is m'( essan' lo distinguish old audit data, t urnnil objerts, 
a nd deleted pstnido-object s. 

Archiving 

A lot of database data is created veiy mpidly in auditing <lata- 
Ijases. Ai sometKant some of it miist be mtivtsl to s^'eondary 
stinage as aiehi\ed data. As usual, andithvg database 
systems t>ose special challenges h)r thirming tiata wittiout 
r ■ n rni pt i n .1^ 1 h e ten 1 ai n i ng c>l jj c^ct s. 

What Is an Archive? Sevt*ral ivpes of archivt^s are possilvle. 
( Jne common rc^jjositoiy is a file containing objt^ct data in 
a spe<ial h inn at and |>robably compressed. Until is moved 
to the aniiive using special aichive ulihiies and inusi be 
dearckivpd ImcJc into the active database fur access using 
the same special utilities. This method maximizes storage 
CO m]j Lie- In ess Imt pays for it by a cumbersome ijroeess to 
retrieve the archivc^d data when nec'ded Anolher jiossilnlily 
is to nio%'e data tu a separate ciata pmlition (liiijie s])ace) 
that, can l>e taken offline. Access to the archived data might 
retiuire dearehiving or. if tlie complexny is tracta!)le. union- 
ing Ihe areluved data with tlu* attlve dala in tjiieries. 

Ai the olIuT extreme is the use ola distributed dalabase 
systcmi to ctjmieci the active and (pussilily multiiile) aitluve 
databases. The archive medium, then, is just another data- 
base that should have read-only access (except during an 
archive ope rat icai by systcMU ntiliiies). A distiibuted data- 
base system connects the active and aixhivc^ databases dining 
the archive and dearchive jjioc esses, allowing the data tcj lie 
movt^d between clatabasc^s as a distributed transaction. 
This is the method we ha^'e chosen to use in our pi'odiicts* 
A distrihuteil an hive system allows eor\tirmed grtvwth of 
archived data while relaining reasc)nai)h^ access times when 
necessur>\ Another ad vani age is the reliabiliiy otthe an hive 
and dearchive processes because they are a distril>uted 
transact icai snlyect to two-f>hase commit [irotocolsand re- 
coveiy nK^chimjsnis. Finally, il is possible to access archived 
daia anhniiatically wilhoul demxlming if die tuciiive data- 
base is m\ hue. This mdirect access feature is explained 
more fully ijelow". 

ArchiviRg Entire Objects, the [\vsx met lumism h)r ihiiming 
dala is to remove objects I hat will nt> lottger l»e moditiecl 
(leiuMally slants within the object indicates when this state 
of lile has been achieved or. t>erhaps, just the time since the 
objecl was last modifitHl is sufficient. Can w^e just remove all 
revisions of the object from the active flatabase and put tlvem 
hi an archi\T record? 

Tlie first problem is simply fimimg the old object because il 
migln have been deletc^d. Il might not ev cm l>e in the list of 
current objecis in a nonaudited list. For examplCj in F'ig, -i 
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we had lM?if er not delete a Dept or delete ii from the BeptList 
imtil the time cfune^ to archive, or we will never \w able to 
fliid the or|>hai\e<i oiyecf. \V]wn Jirrhhing a Dept it woidd be 
an Qveniigtil to ;m'hive jusr the ciirrenf Emps. Wluit aboui 
the one that was deleted earlier in the life of the Oept an<l is 
i^ferenced only in an old revision? Fig, 4 Rhou^ this to be 
the east' for Emp2 in Deptl. Evidently, It will be tt€»<'<*ssar>' to 
jiear< h all the old re\isions of all eoniposite objeels just to 
identity all ciuutidates for an hiving. A sjitx ial key field to 
identify all rornponeiils of a eoiiiposile to be archived is a 
big helt> here. 

Tile second, admittedly meefiaitislic, worrj' is how to remove 

iui audited objc^rt, sint e deleting actually resuJts in inserting 
a new psendo-object. ;uifl we can't even actx^ss a deleted 
object at cnrrent time! Presimuihly some additional code 
design and imi>lemen ration provides a niecluinism for aciu- 
ally retiio\ing an audited obje<1 and all of its old re\isions. 
as well as accessing deleted ohjetns, Tliis t)peratif)n is called 
tmtisfi'r out to distinguish it fronuleletion. Sitiiilarly, the 
daiabiLse must allow transfrr hi of multi])Ie object revisions. 
iiKiudiiig psDiido-objects representing delete. 

Now we can move on to the problem fjf otliei' objects ihal 
access the ai'chived objef^t. Because archiving is not ileleting. 
objects that reference an arcliive<l otjjeci need !<i retain 
these references in case die i^chivt^d t>bjecl must be ac- 
cessed in the futiue. For examt^le, we should retain an entry 
it) the nonandited DeptList for a^i arciu\'ed Dept object e\Tn if 
11 is not immediately accessible. One solution is to place a 
status t>bjeci fin each relatir)n.shi]i in the DeptList This slauis 
olDject can < onlain archive* infoniiarion, Ajiodier sr^liition is 
to replace the arcinved Dept otiject (and its components) 
with a placeholder object thai marks it as archived and 
cTHiId als(j cfMiiain archive inffinnation. Iniess we wiinl to 
Stan rivniging refereiu es in old otijt els, (his new tjlacc- 
holder object will have die satne DID (object iticntilier) as 
the old one, A variation on t\w second method is to record 
archive infcuTiiation within the ODHMS and traj) references 
to archived objects. 

Tliesesolutirins work if t lie referencing f>bject is not audittMf 
Bui uli^tt ifil is andiiefl? t -tKlating tlie currtvnl ohjecl or 
nutt king die statns ol ils reference to tlie ajcbived object 
tnay be saitsfaclory for cunent time access Init will result in 
a load eiToT if older revisions attempt to access die object 
using references tliat wtTC valitJ back when tlie old revision 
was current. Unless we want to sti:irt uprlating oki revisions 
(a scary idea if we w^mt to tmsl the integrity of audit ef I data), 
theart*hi\ing mechanism musi hajidlc these old references 
betwetMi auditcfi objects withoiil modincation or f|ualiflca- 
li<Mi ot I he rild references. The general I sohjlitju to the 
arcbive-refcrerK^e fjroblem probably must be imjilemtMited 
at the database level Tlie database lock or load mechanism 
must be alile to distinguish a reference to an otjject that 
nener exisied for die tevision-tim*' criteria stjecituHl frfjni one 
that (^xisted bui is ritjvv artiiiverl. The user must be notitleti 
l}ia( \hv dalu i.H archived wiihoui disrupting normal processes. 

Incremental (Time-Slice) Archiving. In some atiplications it 
may nul be [nat tiral lo archive eiitirt^ objects. The life lime 
of s<mie aniiivabie ryljjects (actually composite objects with 
thousantfs r>f comprjiietil objects^ in some systeius ran l>c as 
long us tlve years, making urrbiving the otiject Iheoretit ally 
possible at some tiiiu* luii ttoi veiy useful for redticlng onlim^ 



data on a monthly or yearly basis. C'leaHy a mechiinism for 
archi\ingjiLst the aged revisions of objects is netc^ssary in 
these appiicatioiLs. 

The best way lo specify incremental archiving is on a tinie 

basis, because time cait be applied unifomily to all objects. 
In this scenario we could specify a list of candidate archive 
objec'fs and a threshold aniiive time, snch that all revisions 
of these f objects found with a comtoit tiniestmiip c*qnal to or 
earlier duin die archive ihresbokl vvoulrl be uuived \u the 
archive. Well, actiially. tiot t|uite ;ill of them! Since we nnist 
satisfy requests by the active datal>ase Un' the object at die 
threshold time, we must keep tlie one object revision vvith 
a commit timestamp before the threshold time because this 
revision is ciureni at tlie tiireshold linie (iuiiess the object 
was fjeleted, of course). 

To implement this incremental aniiive niechanism. as de- 
scribed so fai; the system must keep track of the thieshoid 
lime and archive infonuation about the revisions of each 
oijject. Atteutpit^d access lo revi.sions extant beffue the 
archive time slu^iild receive an archiv-e error and i>erliaps 
supi>ly the archive infonuation s{^ iliat the user knov^^ where 
the data can be found. 

In this si'^enario. arch ivi tig probably is not a one-time oj^era- 
tion. What do w^e do with the remaining revisions of the 
objet t when the an 1 live operation is repeated a month later, 
spciifyiug a threshold archi\e time one moulb later than 
that in the firevjcjus operaiitm? Frotn a bookktn^ping [)oint of 
view, it vvouid make sense to siniiily at)jieni:l tlie tiew archive 
revisions of an object to the old ones in the at chive and U[)- 
dat(^ archive infirirmation in the active < lata base. In practice 
most cuslomers will not find this mil hod any more acietit- 
able than filing tax rec<irds by subject radier dian flale. Mo.st 
archive^ time slices will be kt^pl as an arciiive record labeled 
by the date rangi^ of the tiata it contains: il could be a tape 
(collecting dust in a rack, 11" we needed lo apt>end lo an 
archive wbt^u'ver more n^visionsofa long-lived object wpre 
archiv^ed, the archive operation w^ould eventually require 
nKHinling many archives. Thus, a practical ^irchive nii^cha- 
nism must allow various ri'visicms of an audit t^d ohjecl to be 
scatlererl in nui|ti|>l(^ an hive tlatabascs- 

I f a s i iigl e f ) 1 ij f *c f t an \ k ■ c o n t a i n e r 1 in rn u It i j > k* arc h i v es . we 
nuisi know which archive might contain the ictt^iesled data. 
Mojeover, it would be nice to guarantee that the load re- 
t|uesl could l>e satisfind if the archive were made available*. 
.4 customer will Ik* npset if the tUihive supposedly contain- 
ing th(* missing data is frjuml and mounted mid then the cus- 
tomer is told that I he data still missing! Thus, it will be most 
convenient to retain in the active database complete infor* 
mat ion about the range of revisions and conmiit timestamps 
(if an (tbje<t in each airhive. This archive rcctud, cahed ,m 
ftrthlrc iiHft, contains iufcn'mation aljout I lie continuoiLS 
secpience of object revisions of an object that were trans- 
fiMTefl in the archive operation. 

An example of linu^-slice archiving is presented in Fig, 5. 
An audih^d object ideniified by ObjNum 1D1 has t reated 10 
revisions in the active dalaljase, Al some time in tlie past, an 
arciiive database* was created, designated jis IS^J)5 here. The 
first time-slic*e f)t>erattou moved revision I to the archive 
database and hit an ardiive recfirtl it> the active tlatabase. 
The* archived object actiuircil a new idt^ntifier, shown as 23, 
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because an ObjNum is unique only within a single database. 
Subsecfuently, aiiotlier arrhive operation moved revisions 2 
and 3 to the sanie datal:)ase. leaving another arehive record. 
The following yeai; another archive database was created 
and revisions 4, 5, and 6 were archived here. 

De arc hiving and Archive Access 

Dearc hive Operation, The process of dear chiding is just the 
reverse of arcluving, whether the archive medium is a com- 
pressed file or a remote database. If incremental ai"cliiving 
is used iind an archive record is maintained iii the active 
database, it reduces bool<keeping to deaixiiive ar^ archive 
unit (group of continuous object revisions) and remove the 
archive record from The active database. It is also neces- 
sary to deaiT^iive archive units continuously from the youn- 
gest one to the target one to ensure the integrity of the time- 
retrieval mechanism. There must be a contituious revision 
sequence from the ciu-rent times tamp to the timestainp pre- 
cedi]ig or equal to the target timestamp. 

Indirect Access to Archive Data. Of greater interest is the 
possibility tliai deaicliiving may not l:>e necessary. If ar- 
chived data resides on iirchivc databases in a distributeci 
database system, it is possible for a sophisticated object 
manager to access archived data in remote archive data- 
bases and integrate it with the active flata. imj:>ot1anl advan- 
tages of this mechanism are: 

f Reduced resom'ces for the active database because 
deai'ciiiving is not necessaiy 

^ Transparent access to archived data by ordinary users 



Fig, 5, Time-sUce archive cxamplr\ 

' Reduced administration, because the aichive and dcajchive 
processes become simply distributed transactions without 
introducing special mechanisms into the life of a system 
administrator. 

This mechanism relies on maintenance of an archive record 
in the active database that records information about each 
archive miit placed in an archive database. The existence of 
an Jir chive record in the active datal)ase allows the active 
database to return afonvatxiing reference instead of a load 
error when a retiuesied revision or time of an object has 
been archived. The reference contains the address of tlie 
archive datal^ase, allowing the object manager to proceed 
to indirectly load the archived object as an alias for tlie 
requested one. Obviously, ahas objects must be marked to 
proliibir update. Tlie o Inject manager can take the appropriate 
action to access archived objects (or re\isions of objects) 
depending on the wishes of the user and system policy. In 
our system, the object mmiager recognizes several access 
moties to indicate how to treat archived data for each appli- 
cation operation. 

CoBclusion 

Tlie trend towards requiring audit trails of more and more 
processes is driving new database capabilities. Old models 
of audit logging and i:)eriodic archives do uot provide roudne 
access to audit data and aie not scalable to large systems. 
We should i\ot view auditing as a specialized, applicaiion- 
specifjc capaluliiy to be overlaid on a general-purpose 
database. 
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Objet'i claiaft^ase s>*stenis are weli-siiited to kiiplDnienl ihis 
iiei^' re<^hiinlagj- !>et^aiJso much of the technology can be 
incoqMimievi i*ITicit*iiil> wiihin tin* [)BMS, fret'in^ tlie <]v- 
signer nrni ijrt>gnmunrr fiiini niiuiy of the new cnmplexities 
iniioihjceci in iJie fiiscnssioti above. Ati hot* inipleint^niatioiis 
using stored procechin*^. i riggers, or otlier enhancemenls of 
relationa] darabases will liave *liffttnilt>' maleliini; the efTi- 
eiency of systenLs in which amlitini; 1*^ an inipiicii *'ai>abiliiv- 

Aiuitring objectit in complex schemas anti archhing tlie data 
in a rlisi rihuied environnient are ciitnplex proct^ses thai 
wouici appeiir to Ik* ciiftjcnlt to implement in orclintiry apj>liea- 
tioits. On the contrary^ we tiave found th^it these capabilities 
(*m\ be used reliably by application develofjers because most 
of tiie conijjlexity can be concentrated in the object ouuiager 
of an ODBMS at id core class £M>de. Similaily, access to 
iUTbived data can be nearly transparent to most applit*atirm 
code with judicitnus use of access inwlcs and exception traps 
if rhf objeci tnmiager mipJements automatic indirect access 
to cUThi\e da! abases. 

Ttic ambitious goals of rat^id access to active data, conve- 
nient ac*cess tfi old data* practical database size, iurd rei^on- 
able application coniiilexity c^m be achieved in an intenially 



audited systeiii Ijy c^u^eful design of a distributed database 
sj^tem. 
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Testing Policing in ATM Networks 

Policing is one of the key mechanisms used in ATM (Asynchronous 
Transfer Mode) networks to avoid network congestion. The HP E4223A 
policing and traffic characterization test application has been developed 
to test policing implementations in ATM switches before the switches are 
deployed for commercial service, 

bj Mohamniad Makart^chiaii and Niciiola.s J, Maltalm 



Tii(^ AsynrhronoiLS Transfer Modi- [ATM) is a nrtvN ork lecli- 
n^>k)g>^ that i aii satisfy the t]uality-()f-soivif e li^ciuiiemt^iMs ol 
many rUfftnent ty^Jt^s of trtilTic. Ttif^ aliility of ATM to tiaiKlle 
niiiny (iiffproiii t.\^)C's of I raffic aiid its ainlity tfj o^jeraie ar 
high baiidwidtlis position it to be one of the con^ technolo- 
gies behind tiitnre broadband widp area net works and I be 
fnieniei. To jn'ovidc qualityof-seivic e giuii^n^feeis. ATM re- 
lies cnit'ially upon avoiding networlc congestloti. Congestion 
can result in iinareeiitably laige cell loss or delays, Tells I ha! 
a?'e losi may liave lo t>e re transmit fed, whirh can t'esall ii» 
iin I'tasici fongestioii. Cells with exresslve delays rati canse 
lugliei'-layer profocol timers (e.g., TCP/IP timers) to expire, 
wliieli will result m even more cells being retransmitted. For 
video trafne, excessive delays or cell delay variation can 
result in tirKlerllow in the video decoder bnlfers. This can 
caiLse jagged nKJvenieiKs orscriH^n freezes when inlaying 
hack the video. 

Policinj^ is one of the key mechanisms used liy ATM to avoid 
network congestion. Policing is responsible for monitoilng 
the aiiKmiil of haffic sent hy a connection. If a connection is 
sending iikhc Ihan llie agn^ed-npon amonnt of traffic, Iheii 
policing can discard (raffjc from the otfending coniu^ction. 
By prevent hig too nmcli irafTic from eatering I lie network, 
policing helps to avoid network congestion. This easnres 
that cxisti ng con n e ct i ons i n t h e net w or k will r o 1 1 1 i 1 1 1 1 e t o 
receive their required quality of service. 

Policing nccnrs at t h e n mo r-iffltv n t k hffc y(fu c ( I i N I ) . w h v re 
user traffic first enters a pnhlir network, and at Itie hnmd- 
b a a (I ISDN i n te i va 1 1 i ei ' t nt ctjh ce { B- 1 C I ), w h e i e I m tfi c 
crosses from one public net^vork to another. Policing is 
knovvTi as it.sa(p' pfiraitfrtrr cnttfrnl (I VC) at the UNI and 
N pff( 'o t k pa ra nnier ro u / i vi i N PC } at t h e B -ICl . 

Given the !]n]>oitaiu e of (Kilicing to ATM, it is essential that 
policing he well -tested. Policing must be tested both by net- 
work equipment numiifacUjrers when develojnngsmtches, 
and by nt^twork providers when coininissioning switches. 
The IIP E I223A policing i^uid trafljc chara<1eii/.a(ion tes( 
appJication tuts been developed to test policing implementa- 
tions in ATM switches. This product allows nsers to generate 
policing lest iraftn* and to ineasiaT how the traffic is tiffected 
by i)olicing. In tl^is way the IIP E4223A can thoroughly test 
poll* ing in ATM switclies before the switches arc dejiloyed 
for crmnnercial service. The HP E422M can also analyze the 
traffic onghialing from a traffic source to dctennine wheUier 
the source is sending too much traffic into the network. 



How Policing Works 

For ATM to nu <1 its quidity-of-senlcc commitments, jt is 
essential to reduce or eliminate network congestion. 
(Congestion can result in unacceptal>ly poor network i>eifor- 
mance. ATM attemiits to a\'oifl congc^stion tiy managing net- 
work res<uiicrs (e.g., U'ansmission links, buffer sjjace inside 
switches) in such a way that congestion will not occur. 
A connection will only be established if there are enough 
network resources to picnlde an acce]7table rjnaJity of ser- 
%1ce to tlie new connect lr>n without di.snii)ting the service 
[irfnldetl to c^xisting connections. Once a connection hjLs 
liee^n established, usage parameter contiol (UPC) is respon- 
sible for polieing the connection tratlRc when it entera the 
nelwork and ensuring I hat I he (raffic does not exceed the 
agi ee<l-Mj>on Iraltlc iat(\ 

Depending on the requirements of the i raffic source, ATM 
]3rovifk\s a variety of senlce categories, as shown in Fig. I J 
For example, an application such as digital voice rtiay Ijc 
suited UnroiiatanJ bU rate (CBR) seivlce, while compressed 
vi r I eo n i ay b e sn it e d f o r' ^ ra l-timc i a t ki hi e bit nt^ e { n - V Ti R ) 



List of Acronyms 

ATM Asytichronous Transfer Mode 

B-ICI Broadband ISDN intercarrier interface 

BSTS Bruadband Saries Test System 

CBR Constant bit rate 

GOVT Cell delay varistion toferanrre 

GLP Cell loss pnortty 

GCRA Generic ceil rate elgerithm 

MBS Maximum burst size 

N P C N etwo rk pa ra m eter contro \ * 

PCR Peak cell rate 

PVC Permanent virtual canneEtion 

rt-VBR Real-time variable bit rate 

SCR Sustainable cell rale 

SVC Switched virtual connection 

TAT Thenretfcal arrival time 

TCP/IP Transmission Camro I Pro tocal/ Internet Protocoi 

UNI User- network interface 

UPC Usage parameter control 

VBR Variable bit rate 

VPl A^CI Virtua I patti identifier/virtual channel identifier 



90 



Xiisn ist 15 H>7 1 Ic wIp [ t-Piw kiirti Jc w i mat 



)Copr. 1949-1998 Hewlett-Packard Co. 



S«ryie« Csitgoff * 


Cltericlsmlics 


Example Applicalioos 


Traffic Parameters 


CofLStaitt Bit Rste (CSRl 


* Static amDuM dI bamtwidifi ovailaMe 

* Low cell liS£ ratio 


Vkteo/aitdie ob demand, video 
CMfcfenani. d tgital telepb^ny 


• P€ft.CDVTaitCI^ = 0+1cetb 


R«aJ'Tiiiie Wariflfale 
BH Rate (rf-VBRt 


• Sapports liimv IfafTtc 

■ TtyMf liODiifkil cell rivtar irafia^M 

■ Imw cell loss fatio 


Compressed video, cfisiributed 
dassriKKO 


• PCa COW awl SCR. MBS flfl CLP = 11+1 cells 

• pen CDVT on CLP = 0+1 eefis, SCR MBS an 
CLP=0 cells 

• PCa COVT on CLP = + 1 cells. SCfl, MBS on 
CLP= cells, tagging applicable 


BilHtternft-VBR) 


• SBp|M»ts bumy traffic 

■ No cell delay waiiatiiiEi bounds 

• Lew csll loss ratio 


Arrf ine resetvations^ b^eblng 
transactiofis 


UDSpeciried Bit Rate (U8R[ 


• He CiEill delay variation booiMls 
« Hg cell loss ratio bounds 

* "Best effort' service 


Rle fransfef. ennail 


• PCfi, COVT on CLP = + 1 cells 

• FCB. CDVT on CLP = G + 1 cells, tagging 
applicable 



•Tha ATM Forum also defines a service categary called available bi! late jABR). Policing of ABR connections (S nol discussed iit Ifiis articJe. 
Fig. 1. ATM s<'n1(f(e ralegoriif.H. 



service. The ser\ice category used by a comiec^tioii is cho- 
sen at connection setup tinie. For i ach service calegor^^ 
sev^eral tr^ifiic parametei^ are given to the net^vork to de- 
scribe tlie ty|>e of liafllc I hat will he sen! by the contiection. 
For Sir fichcd liiitiat nfUHfcfioffS (SVCsJ, rhe Maflu- paiani- 
Mbts aie given to the network during the call setup or re- 
negotiation phase of signaling. Far permanfift rhiiia! ron- 
m'tiffiiis [PVCs) I he iralTic paranielei's can be si verified 
iTiaiuially al subsc lii^Uon linte. The traffic parameters are 
used by Ihe network to police tiie traffic on tlie connection 
and to del ermine how many netw^ork resources must be 
reservetl fo su]>porf the connection. 

Cells in iui ATM nem ork aui he given a high ])rioi1ly or a 
low priority. liigli-i>iiority tells have ttie re// foss priority 
(CLP) bil i[i l.lu^ir lieatieis set to 0, while low-piiority cells 
have a VLP of L Low-priority cells are more likely to be iiis- 
carded if the network becomes congested. Al the iiuninumi, 
the Iraffir fiaraitiefers tleclared \o the network at comu'ction 
setiijp lime im hide lhe;>mA- rH! riifc (PCU) atul Ihe trll (Hrtij 
varintimt tofpraim' (tT)VT) ffa-('LP = 0+1 cells, that is, lor 
all fells in the connection, regardless of (>riority. 

The FCE is the ntaxiintim rate a( which tiie soiuce niay gen- 
erate traftk'. The C*D\T mdit^ates flow many hack-to-back 
cells there may be al the user-net woik nU erface. T<jgether, 
the F*(T^ arul the t DVT give Ihe network an irlea of wlu^ti 
to expect the next arrival of a cell given that one has just 
arrivet I In addition, for variable bit rate sei-vice categories, 
a tm[ric source can also specify a mslninahle cell mte fSC-R) 
jind ti wuximum burst ,v/^c (MBS). The SC R pves ar^ upjK-r 
bonnil on the conforming cell raEe uf a V\Mi conr>er|ion. The 
MBS gives the maximinu burst siite for a VBR comieetion, 
£issuming that the cells in the hurst arrive at Ihe PCliStJeci- 
fying the SCR and .MOS allnw^^ tfte network Xn allocate n^ 
soiuces such as luiffer sjjace more t'lllcjcnrly because the 
network has more knowledge ahout the type of trafllc that 
will be generated. 

The UPC fund ion is respoasible for ensuring that traffic 
on a connection dr>es not QXix^^d the agreed-upon rate. 
The t>olif'ing fuiietitjri in a switch first vali<lates the \T*1A'( I 
{irirhiftf p(ith ifieittfjicr/rh'tual ch/nuirf ifirnllfh^'i of aniv- 
ing tells, and tiieu determines whether or not ihe cells are 
conforming to the agreecbnjjon PCR or SCR. Wlielher or not 
a r I'll is ro!iforming is determined by an algoritinu called Ihe 



lienerieceU mic nlgorUhm (GCRA),'^ popularly known as 
the ''leaky buckei" algorithm [Fig- 2). The GCRA has two 
pariunetei^. denoted T aiul t. The first parameter, T. is the 
eiTtission internal attd c*tn lie regai\ied ^is the expect eti inter- 
arrh al tiuH*^ of t otifonnhig cells. The sectnid jjaratneter. t, is 
the ceO delay variat ion toleratice (( E)\T) and determines 
how many back-lo-liaf^k cells are allowed The GCRA main- 
tains a variable called the thmrelical urn vat time (TAT), 
which gives tht^ expected aiTival time of lluMiext cell. Cells 
ani\1ng more than t units of time hehuH' I lie TAT are cf)nsid- 
ered to be nonconforming. Nonconforming cells can be 
tagged (given a lower priority) or discar-ded by the switch. 
Ceils arriving t units of time before Ihe TAT or later are eon- 
siciered to l>e confi>ruhtTg. Koreach crmforming cell ilie TAT 
is updated to gi%e the exi>eeted arrival time of ihe next < ell 
in the connection. 

' The GCflA IS a mteTBJw^e slger ithm used to d&fme eonformanca An atiusl UPC imptefl^enta- 
tion may usa ttie GCR.A or enoTher algonthm provided ihst the quality-of-ssfviee obieetives Im 
ranriectjaasareniBt 
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Fig. 2, iirinTK- ri-IJ nit.n (Iraky huvkn } nlM'irlihnt. 
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For an exarri[>le of how the CiCRA laii he ilsrI lr> pf^Iicf* a 
CBR roimt'tiion, suppose Hml a i^uiiiieriioii ha-s a peak cd 
rate of PCH - Sf)OQ cells/s and a CDW of ^us. Tlie CiCRA 
emission mtpn'ai Is calfulaied as T - 1/PCR = 125 ^ls. The 
(it RA (DVT LsT = us. Fable i slitnvs (he resultant crjn- 
hmiiLng anti noiKonforniin^ celJs assuming tlial a tell on 
iliQ i: u n 1 1 eel ion a rn m s i ^v't^n- 12!) n.s. 



Table I 

Conforming and Nonconforming Cells 

for PCR = 8000 cells/s,T = 0^is 



Cell # 



arrival 



Vsl 



TATIns) Conforming? 



1 





Q 


yffi 


2 


120 


m 


1H> 


3 


240 


12& 


yes 


4 


360 


mt 


no 


& 


480 


m& 


yes 


6 


600 


605 


nu 


7 


720 


m> 


yes 


8 


Bm 


m5 


no 


9 


900 


845 


yt's 


l(t 


1080 


108^5 


no 



To see how I he CI )VT cau ht^ iurreasc^d fo aJluw inca t^ baek- 
to-bat k relLs, siijipuse ihtit the t DVT used by the (rC RA 
is increased to t - CDYT ^ II jis. Tlie resultant ])atteni t>l 
confonuing aiid nont^onronuing cells is shown in Table II. 



Table II 

CnnforFning and Nonconforming Cells 

for PCR = 8000 cells/s,T = t1 ^s 



Cell# 



arrival 



(us) 



TAT(|.Ls) Conforming? 



1 


t 


Q 


yes 


2 


m 


M 


yes 


3 


240 


m^ 


yes 


4 


360 


mn 


no 


5 


460 


^n 


ye^ 


6 


600 


m 


yes 


7 


720 


m 


yes 


g 


840 


S5i 


no 





9()0 


865 


yes 


10- 


1(^80 


1085 


yes 



As mentioned eaj lier. some sei'\1te eategorief^ also ha\ e 
sustainable cell rate (StlRj and maximum inu^t size (MBS) 
pmaiuelers. In tliis case, one (jCRA is used to police tlio 
jH^ak cell rate and anotlier CtCRA is used to pf>hee I he sus- 
tainable eell rate. The (K RA lolerLmct^ iLsed with Uie SCR 
GCRA fXi^.^) is deiived from Ihe PCR, SCR, MBS, aad C^DVT 
parameters,^ and permits a bat^f of MBS cells at the peak 
cell rata The P(;R and St'R (jCRAs form a dual leak>' bucket 
algorithm and operale in lockstep fiishion. A cell is onJy 
considered ttj be coiiCornung if it confoniis to t>olh (jCR.\s. 
If tagging is allowed, then high-priority CLP = cells can be 
tagged (given a lower i^noriiy) Htht^y do not conform to tlie 
SCR GCRA. 

In addition ttJ its role in the LTC function, the GCRA can 
also be nsed iiisidt^ b traffic somce to ensure that the out- 
goiiig cell flow cpnfqnus to a particulai" ceU rate, Tliis is 



referred to us trafilc shaj>iiig. Wfieii nonronfotiuance is de- 
lectetl tluring shatjing, tlie otlending cells are delayed until 
tlieir transmission will be confornung. In this way, traJBc 
can be .j^uaianteed confoniiiag Ijefore it etiters the network. 

The HP BSTS Polj< iiig Apph cation 

The IIP E 4223A policing and IralTit ctimaderizution test 
apidication is tiesigned to test UPC itnjjlenieivtalions ui nc^t- 
wnrk e(|ui]Hnent and to analyze tlie t harartcMisties rtf tmJt'ic 
on a f'onnectiotu The IIP E4223A is part of the IIP Broadbmid 
Seiies Test System (BSTS),- The HP Broadbanc! Series Test 
System contains a mmiber of VXIInis tnodules that alk^w 
testing of broadt>and netwfjrks over a variety of jjhysical 
interfaces. For brevity, the HP P422:JA will be tlenote{I the 
HP BSTS policing ajiplication durijig Ihe remainder of tliis 
article, 

Tlie HP BSTS policing application works with the HP E420f) 
cell protocol [iiocessor. a VXIlnis module forming |)ari t)f the 
II P 13 STS . Th e ce 1 1 1 aot o c o 1 p i < ) (tesso \n\ roi\\\un:\\ u j i w j t b a 
line inteiiiice module can traJismil and receive ATM cells for 
lesting pmposes. Received AT'M <*ells can be storetl in a cap- 
ture Rj\lVI for Liter iindysis. The HP BSTS ixjlicijig application 
consists of embedde<l software running on llie cell protocol 
processor mod l ik^ ai u I a st i ft vv m v eo 1 1 ti )f a le n t nu n ling o n H P 
S)0()0 Series 400 or 700 workstatioiis. 

Tlie IIP BSTS i)0licijig application provides the foliow^ing 
fmictitjns: 

• t generates h^alTic confonnitig to a smgle or dual leak:^^ bucket 
algorithm [G(T^\}. 

• Generates UPC test cells, whicii are test cells designed for 
t est i tig jjol icing. 

• Makes a number (jfpolicing-related measurements on 
eajHured ATM cells, such as the nuinlu^of none onlornung 
cells and the muiiber tjf ceils ihal were Ir^sl ur tagged, 

• Mal<es general performance nunisureiuent^s rjii captiut^d 
ATM cells, such as cell delay, interanlval time, atul one- 
point cell delay \ aiiation. 

Traffic Generation 

\\hen geneiating traffic to lesl |)ohcing, it is important to 
test tlie limits of the GCRA being itsed for policing. This 
me^ms that it is imiiortant to generate tiaffic I hat has the 
maxinnati cell rate mid burst .size but is still conforming. 
Because i>oUcing is configured in a switch usit^g the parame- 
ters of a trCRA. it is convenient to use the paranteters of a 
f jC^RA wbeji specifying traffic to test jiohcing. ( 'onset lu en tly, 
the HP PjS'fS I Hj I icing a triplication provides a GCRA disinbii- 
tion. which allows traffic to be generated using tlie parame- 
ters of a CtCRA (Fig. 3). The GCR.^ combinatioas supported 
tu'e: 

• PCR CLP - 0+1 (single leaky bucket) 

• SCR CLP = 0+1 iyid P(;R CLP = 0+1 (dual leaky bucket ). 

The (iCRA distribution can be optimised to generate traffic 
basted on either I be cell rate or the bui^sl sii^e. This allows 
buk^pendent testing of liow policing implementations handle 
f^ell rates and hurst sizes, 

Trtiffic that optimizes the burst size consists of repeated 
bursts of the maxiinum |)ossible confoiniing bm"st size. The 
liurst is at the line rate IVa-the single leak-^- buekeb Fra the 
dual leaky bucket, tlie bm^t coasists of MBS celjs at the peak 
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Fig. 3, SpiTifvIng the gpiiprif* cell nafp algontlim disfrihutj<Hi 
mth tlio HP K422'fA policing and irafnr charafrtcri?>al.ion test 
aiJpMi ration. 

cell mic\ Thc^ spacinii hot weon thr hursts is the niininiuTn 
nccessart' to uiainiaiii cDnforniing rralfic. For cxamplc\ ssup- 
poyo a tliial leaky Inirket Is rhosen with SVR - '.^5%, MH8 = -^ 
tells, PCR = 10», rniil CDVT = rns. The Kt^neraled iraffir 
will consist orrei>eated bursts of three eells at lOCWi of th€^ 
line ratP, with a gap nfsix t ells between any two hursts. 
Because cell IrajismLssifin is (|uar^fizeii, Hie generatetl loarl is 
t^VMt, whitit is less ihuti the SCR. Trarne rliat optimizes the 
bui'st sixe has vciy jjrecisely con trolled hurst sizes, hut the 
rate of the generated tridlTe may he less than requested. 

'IVafrie that opiirnizes the cell rate consists of frafric gener- 
ated dt the ttuixiinnm tun forming rate. Tlie t raffle will con- 
sist ofaji itiiliai hursi of die inaxinium conlunuinj^ hurst 
aizv, Inllowed hy cells at the PCR (for a single hviky iiueket ) 
or at the SCR (for a dual leaky bucket). For example, with 
a dual Ic^aky Inukei with SC^R = 35%. MKS = ;J cells. PCR - 
lOtyXi, and CU\-T - ms, the generated Irnffie ennsists ofiin 
initial burs! fd three rt^lls, I'nl knved liy eoutVirtning tralllr at 
the SCit Wien optiuiizing the f ell rate, the f^enerated traffic 
rate can he much closer to tlw* rtn^uested rate titan traffic^ 
dial ot>linuzes die huisl size. 

Policing Measurements 

When generatmg tniffic to test policing, the number of tagged 
or disciLf fieri eells must he measuretl. It is difllfult for test 
etiuiiuncul Jn ntnke these mejLsurenients whh icgrdar tiser 
traffic. This is because the test Hjuipuu^iit usually does ntn 
know^ hf TW many user cells were tnuisniitted or the origiital 
priority ()f the user cells. For this reason, test cells are 
olfen used to mensnre j)r>licin.^ [jerfiH tnaiue. The IIP BSTS 
pnliringrjp|plieatinu ran iransniit se<juences of I PC test ceib, 
each cell having the forniat shown in Fig. 4. The CPC test 
cells are s]>eciricaily desigue<l Tor testhig policitig with the 




C0h 
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Fig. 5- ITPC lest cell measuremenis. 

IIP BSTS policing application. The payload of each CPC lest 
cell within a se<iucnee rontains the following infonnatjon: 

• SNij^|, The tuimher of jirevion.s test cells in the sequence. 

• RES. Rt\ser%ed bytes, set to zero. 

• SLcj^i- The tot;U number of test cells in the sequence. The 
IIP BSTS jjulicijig aptilicjdion can repeatedly transmit 
sequenees of 51:^ or 1021 test cells. 

• Sl^j. The total number of high-i)i iority test cells in the 
sequence. 

• f H :LP The priority of the lest cell when it is first trans- 
tiiittert The h^w-order hll Is set to for high-f jnorlty cells 
ruul I for lf)W-{)nor1ty cells. The remaining hits in this Held 
are set to zero. 

• HN(). The mmiber of (nevions high-piiority test cells in the 
S€^quenci\ 

• VN, The vei^iicm number of the test cell foniiiit. The f iirrcnt 
version niunber is ih 

• CRC*16. A cyclic redundancy check terror code tr> provitle 
protection and validation of the cncodetl j)ayl(jad informa- 
tion. The C RC-n> code is coutpnted using die f>olynomiiil 
xi'Ux^- + x^+ L 

The information rontaitied in the paylcuul of CPC test cells 
allows a number o[ meusurenietds to be niadi^ on cai)ture(i 
ATM cells (Fig. 5). These measurenientB mclude the nimtber 
til' lost r)r tagged cells, whit h are measurements tlirectly 
re I e V ai 1 1 t o h *s [ i n g p t H i c i ug. 

In addition to making measurements with l^PC test ceils, the 
IIP BS IS ptilicing apjiliration can measure die cnnformant*e 
of trafhc on a connection (Fig. ti). ConOmtiiUH (^ is mcasnnHl 
by saving ATM cells in the cajsture RAM and then nieasuring 
the number oF caftturefl nonconforming cells. These rtiea- 
surements cati lie usc*d to test the number of nonronrorming 
cells detected by a switch or to test wiiet.her thi* traffic o!i a 
connection is r (iid'orming. 
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Te!*Ung Policing in a Switch 

Policing in ATM swiniie?? must work con-ertly if ATM is In 
realize its potential for proxiding guaranteed (|iiality of ser- 
\ice for many cliff erejil (,vp<^s oT frail k;. This rec|iiires that 
polieinghe Ehor^>iighly lesltMi. both f luring switch clevelop- 
nienl and cliuing switch clejiloynit^nl. There are basitaJly two 
aspects of policing to be tested: confomiirig cells should not 
he tagged or discarded, and noncontVirirnng t ells shmild be 
ingged ordisi anM to p rot eel the (]ualily of ^(^r vice jirnvided 
to otlier connections. To test the above aspects of j^jlit ing, 
the number of lost cells, the number of fagged c:ells, and the 
number of lost high-priority cells musi all be measured 
(Table IH). 

Table III 
Parameters to Measure when Testing Policing 

Parameter Description 

Lost cells Number of cells discarded oi- lost in the 

switch. This parameter is used to check that 
pr^hring is not discarding too many cells. 

Tagged Number of cells taggecj (chiuiged from high to 
cells low priority) by the switch. This parameter is 

used to check that policing only changes Iht^ 
priority of a cell when net essaiv. 

Lost high- Number of high-priority cells ciiscardecl or lost 
priority i n 1 1 1 e s w i t c h . Tlii s paraniet er is u s e d t o chet ■ k 
cells that |)olicing is iioi discarding too m^iny high- 

priority cells. 

When testing policiitg in a switch with the HP BSTS i>olicing 
application, the approach is to transmit a well-understood 
stream of test cells into the switch, capture the cells after 
they hax'c travei-sed the sv\ itcJu and then calculate hf>w niajiy 
cells were lagged or distarded, Tliis ai>pn>ach is well-suhed 
for stimulus-response type testing lu lest the capabilities of 
the switch systemaricallV' 



To situplify testing, tfie overall philosoj)by when testing 
policing ill a switch is to test one (jC'HA t>arameteral a time. 
This means keeping the cell rate constant wliile vaiying the 
burst size, or keeping the hurst size const an I while varying 
(he cf^ll rate. Fig. T shows how^ in tt^st the cell rate ol a single 
leaky buc ket. The switch is fust (onllgured with the leaky 
bucket paiameteis to i>e tested — in this <"ase, lite Pt R and 
(^D\T for a single leaky bucket. The PCR and CDYT used to 
generate test traffic are then entered, wiili the PCR used to 
gc^nerate the trafilc being lower thaji the PC R in thi' switch. 
The testing then iterates between measuring the rui ruber of 
tagged or lost cells and increasing the PCR. If the number of 
taggt*d oj' lost ct^lls differs from what is expected, a t>otential 
defect is logged. 

Example: Testing a single leaky bucket in a switch/Tlie a|> 

]>roa< li in Kig. 7 was followed lo test a single leaky bucket 
(GCRA) in a switch whh P(T^ = 8 Mbits/s and CDYT = 60 ms 
on a i55"Mbit/s SOMEl^ p(3rt. The HP OST^ policing ajjplica- 
tion was used to generate triiff'ic consisting of a re|)eating 
sequence cjf 1021 UPC test cells confonning to a GCRA with 
PCR = 4% (5.9 Mbits/s) and C E>\T = t>0 ms. Tlie tjalfic wm 
sent through the switch and back into the HP BSTS, where it 
was placed in the cell protocol processor caiJtuie RAM. The 
ratio of lost cells was calculated based on the captured UPC 
test celts. The PCH was then incrcmientt^d by 0.5% for the 
next iteration ut the test. The expected cell loss ratio wa^s f) 
jf the generated cell rale was less than the pohcing cell rate, 
otherwise the exjjected cell kjss ratio was the ijropoilion of 
generated cell rate greater than lUr policing cell rate, that 
is (PCRit-jiinr - PCRfK^tit- J/J*CR[,-.iif,j.. lable IV shows the test 
results. 
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Tabfa rV 

Test Results far a Single GCRA in a Switch 

PCR for Qeneratmg Cell loss Ratio Expected Result? 

Traffic 



40% 


0.00 


yes 


43% 


0.00 


yes 


5,0% 


0,00 


yes 


5.^ 


0.02 


yes 


ao% 


0.10 


yes 


6.5% 


0.17 


y<^ 


7i>v. 


0,23 


yes 



Testing IVaffic Conformance 

Mlhuiigh poliriiig in network switclTt^s will pmie<t f he net- 
work from traffH- st>urt'es llial send lau nuidi traffic, it is 
also iniportaiiT Ihr iraiTic sonrces themselves to genei-aie 
ronfoniinig unfTir \^]n>^Hibk\ If a source i^enerati'S noncoii- 
fonning HttlTu, iheii the iioiuonfonniag t ells will be tlis- 
canied by the network attfl may Imvv to be retrajisnntteci by 
tlie source. Tliis can significantly degrade the network per- 
fomiance ext>erienced by tin* tralTn- source. The HP BSTS 
policing application can be nsetl to diet k wheihera source 
is generating confonnint^ tr;dTu\ 

Example: Testing the conformance of MPEG-Z video tfaffic. Tltis 
example denionstnites how lo use ttie IIP BSTS policiog 
application to nieiLsure i iie {^onTonnance of a traffic soiu<^e. 
As shown in Fig. 8. a laser tlisk player was connt*cted to a 
connnercia! MPE(r-2 em rnjer wiili ti 4>MI)if/s l>S:i ATM 
outt:nit. The encoder wils set np uy general t^ MPl-](f-2 \id€»o 
over ATM at 4 Mbita/s. The user's guhle for the encoder 
states tbat a C'DVT of iOO ms should be sLitncic^nt to vtnn- 
t)ensaH' hir the efh^cts of adapt irijj MPE(i-2 [lackels l^j ATM 
cells, 

Tlu^ ATM ontpnt rjf rhe MPKCM^ encorler w;ls In^t sent di- 
rectly to die HP IJSTS, where the MPEG-2 traffic was jjlaci'il 
in die cell protocol processor capture RAM. To verify that 
die MPFI( j-2 traffit^ wjls being ca])ttu'i^d coriTctly the [JP 
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E4220B MPEG-2 protocol \iewertest software was used to 
play Ijack the captured Aideo segiiient. The HP BtTTS pohcing 
applit alion was then use<i io meiisure the number of non- 
(Xinfomiing ceUs with a PCR CLP = 0+1 GCRA The GCRA 
paiiijueters were ctiosen conser%'atively to be PCR = 4.07 
Mbits/s and CDVT = 200 ms. Tlie IIP BST^ policing appliea- 
tion iiieasiirements showed that the MPEG-2 encoder was 
not well-behavi^i, witli a|jproxi mutely 25% of the cells l>eiJig 
noneoiifoniiing. 

To see the effect of the nonconfomiing <*ells on The \'ideo 
traffic, the output of die MPEG-2 encoder was then diretied 
to an ATM switch before being routed to liie HP BSTS, The 
switch was configured lo police the _MPEG-2 trnffic with 
PCR - 4JJ7 Mbits/s and CDVT - 200 nis. Like the IIP BSTS, 
the switch detected approximately 2&hj of tlie cells as being 
nonccinfomiing. Tbese nonconfomiing cells were discardt^d 
f>y the swiirh. Hie remaining cells passed thiough die switch 
and werecaplured in the i*eli protocol processor cat^tu re 
R.\M. However l>ecause of the large mimber of ATM ceUs 
that were dis<'arded by the switt^li, it was not possible to 
play back even one video franie. This example clearly deni- 
onstiiites the Jni|)onance of general iiig conforitiiiig ATM 
traffic tind sliows liow dit^ HP BS'I'S policing ai>iiIication can 
be usecl to test Liii^ coidbniiance of a r rat fie soujce 

Conclusion 

I*ohring network traffic at the CNl or B-ICI is cmcial to 
maintaining tiiiality-of-service guarantees in ATM-b^ised net- 
works. Tlie ability tosLipptJH theqiudity-of-service requLre- 
rnents of nuuiy different tyjies of traffic is one of the distin- 
guishing features of ATM. This feature means Ibal .ATM is 
well-suited to pro\1ding the backlxuie iu*1work hji fniure 
l>r'oadi)and wide area net^'orks and tlie Internet. The HP 
BSTS pt)licing a].it^lif*ation enat>les switch vendtjrs ajul ser- 
vice providers to b si jjulicing and helps eirsore Uie success- 
lu I i \ ep loy i u e n t of ATM , 
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MOSFET Scaling into the Future 



2D process and device simulators have been used to predict the 
performance of scaled MOSFETs spanning the 0.35'|.im to 0.07-|.im 
generations. Requirements for junction depth and channel doping are 
discussed. Constant-field scaling is assumed. MOSFET drive current 
remains nearly constant from one generation to the next and most of the 
performance improvement comes from the decreasing supply voltage, 
Gate delay decreases by 30% per generation, nearly the same trend as 
previous generations. However, this performance gain comes at the price 
of much higher off-state leakage because of the reduction of the threshold 
voltage. Various solutions to this high leakage are discussed, 

by Paul Vande Vourde 



Hewlett Paokarfl aflofjrt'd CMOS lef^hrtolo^y m tUo nijd- 
1970s. At thai linu* ilu- ^nU' U'ligih L^ was 4 luu im<l llu^gale 
oxide thickness T,^^ was 50 nii^. Sinee ihejs, i'lich lu^w ^rrv 
eiatioii of te{-lint)ltigy has shrunk h^i by ahout WVu atni T^x hy 
about 25%. T\w decrease h\ L,^ has been tiecl to tlie evohilion 
of lilhograjjhy e(jtiipineiit. Ffjllovving these sralmg I rent Is. 
iiiLnnsic galu delay has decreased about Wyt* ]wr gi^iieiation. 
Now genera ti Otis of technology are releiised about evety 
three years* The import aut piiiiciple in MOSF'ET Si ahng is 
that Lg and T(,x niusi decrease togethe!-. Scaling one williout 
(he other (ioes not yielci adt^iuale jK^rfonnaiice inijjrovt^nn/nt. 

The performance metric for ^ate delay is C\^/l, where i Ls 
the load capacitance, V is the siijiply vtdtage ( V,[((), and 1 Ls 
lh(- drive cunetit of the MOSFETs ( average of NMUS atid 
PMOS). C is i'orni>osed of Ixjth gaU* and junction capaci- 
tance. MOSFET sealing, which derreases L^, T^x, atitl junc- 
tion area while int rt^asnig suhslniU' do|)ing. leiiils In keei> 
V fiiirly constanl IVom geneialjon to geiieialitjn. For several 
getiei^ations of technology, the supply voltage was held con- 
stant at TA' (cojist ant-voltage scaling), hi tiiat era. gate deiay 
WJLS reduced liy e\(^r-incre£Lsing Mt JSFET thiv e currents. 
Since the vollnge was heltl crmstartt wliih' tive dintetssitms 
decreased, the eh^t trie fiehis conlimmiisly increased. High 
fields and higli cuiTeiits tend to damage* tlie gate oxide and 
leufl to device deterioration. Tlius, one of the main technol- 
ogy challetrges has been to design MOSF^ETs with adequate 
reliability. 

C oust at vl -volt age scaling entieci as L^ approarht^d 0.5 laii mvrl 
Ttjji n eared ID nrn. The demands of gate oxide reliability re- 
quired tiiat the supply vf)hage be i educed. Tliis occurred as 
the peak oxide field reached roughly 4 MV/cm. We are now 
in an era where suptily vr>ltage is sealed alottg with T^^x ^*> 
thai the peak oxide electiic field reiniiins roughly consiant 
(constanl-neld scaling). This study exatnines some of the 
implications for Oiis of type scaling in fiitm-e technology 
generatiotis. 

Process and Device Simulations 

The 2D process simnlaitjr TSFPREM-4 from Technology 
Modehng Asstieiates Inc. oJ'S(jnn>aale, California was used 



to siniulate sealed .M( JSFET device stnirttires. The inpjuts to 
TSUi'HI^*M4 are ilu^ implani mu\ oxidation sieps ihar wotild 
Ije used in the ac( nal |jroct^ss. The process archiiecture 
assumed is similar to current CMOS processes, em{>loying 
shallow source/drain extensions and deeper maiti soince/ 
drain regions follow^ed by silicidation. 

The 21) device siumlaior MBIMCL iilst) from Tei httnlogy 
Modeliiig Assoc iales Inc., was used to pietiie! the eleiniical 
I hanu'i eristics of the devit^' strumires b'otu TSIPHEM^, 
Heie we nst^ field depi^ndent uKjlnhty models that have been 
hencJnuarked to the HP t'MOSlO process. Iterative simula- 
tions with TsrPREM~4 iind MEDICI were performed to deter- 
mine tlie requirentents on junction depth and channel doping 
prijfile ki ensure [iroper threshold mid siihthr(^shold behavior. 
Fig, 1 shows the de\ice stiitclures resulting hum tiiese sim- 
tthttions for each generation Iroin ().:J5 Min dow7i to 0.07 {xm, 
Ftu b^ less thati (1.15 tun. retrogrudc chumu^l df>iMM,^ profdes 
att^ needed to control the .suhfluestinld chaia< terislit. s. 

Figs. 2 through 5 sununaiize the results of this scalitig study. 
Fig, 2 shows the stealing of T^.x with L^, These two must 
scale together to get adetinate perfortnance improveitieiib 
Constant lwU\ srnling dictiUes tltat \\ii\ mast det revise i>ro]ior- 
tionally to X^^. maintaining a t>eak oxide Held of 4 M\7em. 
Foi ex<uu]jlejlhs ii'suhs iu T^^^x - - -^ ^^^" ^^^^^ ^ikt - ^^ ^'^^' 
the U^ - IM luu generatrtm. 

Fig. '] sho%vs tlie seal i tig of efi ecti\'e chat me I length f biir' 
and the source/drain exletision juticiiot\ tlepth (Xj), For the 
OJ-Min generation, h,>iT is about 0.07 m\ iuid Xj must be nearly 
50 nm. The series resistance of the soutt e/drain extension 
nnisf detrease even as the juiu tioii depth also decreases. 
Tills reqitires liiglier (loinng levels in tlie extension region and 
carefully tuininiizeti spacer widths. 

Fig. 4 shows the scaJit^g of tbreshoki voltage ( VY). Here V| is 
kept at 20^^ of Vtjd to maintain adetjiiate current drive, Tliis 
yiehls V^ - 0.2V for tlie 0.1-^mi generation. Unfoiliitiarelyi 
shice (jff-state tiuTent varices i*xiJonentially with \ > retlucuig 
Vr leails to much higher (jfbstate leakage ciirreni { 100 nA/tmi 
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for (hi* 0.1-um gcmTulioii) than in cunvnt CMDS fee h- 
nrjlogioj?. Hen' tlw siinularion.s an^ tailtjriHl ItJ luedicl llit* 
luariinal leakage. Worsf-casf leakage* woiiltl be apiiroxi- 
nratt'ly one md^r tifjua^iiiHidt^ liiglwr ftJi flu' '), 1-uivi (*a.st\ 

Fi^'. 5 >>hows tJu' scalinj^ c^f drht^ r^trront and luial ,^ak' 
rai>ati(anre. B(m aiisv oflhr sitniilfatuHttis sfalinL* <i\' Lj, '\\,y^. 
V,|,j, and Vf, Iht^ ( iiiKiil ;nul raparilancr titi \m\ rhan^t^ iiUK'h 



from one gent'iafiori lo I lie ni*xt. Therefore, the gate cielay 
nietrif (A/l fhx.re^tses primatily heeause of the flerretLsirtg 
supply vdliage, 

Derlee simulatorf^ allow one toexmninr llw inlt^inal dis- 
Irihniions within the tlevUv. Fig. t^> shown the lateral t^t'drlc 
tirhl alon^ Iht* rliannrl Tot racli of ihi* deviee striU'ttLtts in 
Fig. L Even though V;i,t dcH iv;tses a,s shown in P'i^. 2, the 
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j^eak electric field near the dmin conlimies to increase as 
L^ (lerreases. Howeven rhe wjdili of I he high-field re^itjii 
tk^cn^LSes, giving llie eiet trons less m\i\ less (iisl<UK e U j reach 
equiUhiiuru wjlh ilse eiern ic lleltL When lliis ''nonlociil" etfect 
is iiM^iided in MEDICI, I he elect njii leniperainre can he cal- 
ciUated as sho\\ii in Fig. 7. Mere, even ihoiigli Ihe |>eak field 
incrcELses, the election lenifK^rniun* (terreases as L^ de- 
crease!^. ThiLS, we cKjieer l\ii\\ ]hv reliahility issues telated 
III high -energy chaige f aixiers will become less important in 
fi 1 1 1 lie gen iTat i ons of t ec 1 nif >1 ogy. 

Gate Delay Simulations 

MEDK 1 was user! \n generate a hill set ol' IV curves Ibr each 
of the devices in Fig. 1, K -C AF, an HP software product for 
modeling semiconfliicttjrilevices, \\as then used ifj extiact a 
SENILE model lV)r each device. Only the NMOS de\1t t^s were 
actually simiilati'd. The PMOS niiKiels were ereated from the 
NMOS models witir appjopriat*^ morlifieations ij\ mobility 
and series rttsislaiice to yield half tht* curreni (irivc of the 
eorrespi Hiding NMOS. lliese tleviee niMdels wen* then used 
to sijiiiilaie invert ur ( haiiis tis shown in Figs. 8 and \l The 
load capaeitance was varied to ai.!pr(.>xiinate tanouts (jf:) 
imd 7. [ntereomiect loading was ignored. Tile results are 
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shown in Figs. 10 and 1 J. Fig. 1(1 shows that the gate delay 
ini|jnives ahimt 30% j>er generation with Ihe scaling de- 
seriheii in the |.sj*^vions seeliuii. This is nearly Ihe same as 
die historical trend ofprevions generations. Note that lor 
Lj, - 0.1 |.nn the gale delay ( fanont = 1 ) is less than 15 ps, 
This is faster than the best thai <'aiL riii hmiI ty he o!>laiiied 
%vid^ bi|>olar ECL. Fig. 11 shovvs the derH'ndeiu e < if gate 
delay on lh(^ power snpply The slars denote ihe operating 
point from eon.siani'fieUl sealir^g. Note Ihai these highly 
s<'aled devices otTer liighspeed ojieration e\en at low sufiply 
vfjitages. For example, ihe OJ-|mi generation should yit*ld 
2;J-ps gate di^lay (fanout = 1 ) even vviili V,1j| - O.rA'. Thi.s 
would he excellent iVa" tow-power a|>pli<aiions assuming 
that the high off-state leakage coald l)e dealt with. 

Off-State Leakage 

The previous sections show thai constant -field scaling <rf 
MOSFETs leads to a euniiniiadon of Ihe Ivisiurical trends of 
gale-level iierruritiaiiee imin'oveinent. However, tliis comes 
at the price orexjjouentially increasing f)ff-s1ate leakage 
currents, lM>rexain]jJe. if an advaneed circui! had "lO niillion 
micronu'teis of device width producing leakage tiuTeiU at 
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Fig. 9, Inverter switching wav^fornii^ at tiodes I titifi 2 of Fig. 8. 
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Fig» 12, iJraiii curri iiE aiiti fifJ-suiii' Iraka^c < mit ui l^^fj versus 

1 uA/iun. I he ciuiescenl supply cui renl wtnikl F>e 50A. Clearly 
tliLs is ui jaeecpUiblR Tlieie aiv srveral prcjposnls ftJr Mealinjti 
wilh ilus pruhlein ajid [ will hiii^Hy disf uss^onie of tlivm in 
tJiissetlioil, At this Ijnie we dn n<Jl know Hie hesi wjiy to 
ileal with this problem. 

One ob\ious solution to conitol f|iiiescent pf>wer eoiisunip- 
Uon is to put almosl all the t'iix'ult in poweiHiown iinKk* at 
any inslanl mid aelivale only liiowe blocks tba( are lieing 
a( eessed, '11ns syslein-k^vel type of solution is bi\vond the 
scope of this paper and neefls in f>f evakiaied (jy llu^ design 
coniniuitity. 

Another possible soluticjn that has been ])rcjposed itnulve.s 
multiple thrc*sliolfl de\nc es in the same technology. For 
example, the 0. 1-um generation could offer Fl^Ts with Vt = 
(),2V kind V( " 0.4V. The low-V| FETs could t)e used for speed- 
critical patlis aud tlie higher- V| FETs could be used for tijsks 
for which speed is not as iJuportanL 

After modifying the tkjping profik^s in TSKPEEM^ to get 
higher thresholds, (he MEDICI sinuilations were repeated 
mvd new StHCK iiio<lets extiatiinl. Fig. 12 shows Ihe resulting 
drivf curreul arui off-state currenl for various \alues of V( in 
the O.l-fuivgeneratiou, Fig. 13 shows the gate delay as a 
function of \V From these graphs, FETs with Vi = 0.4V 
would yiekt gate ttelays aliont 8t)'Xi longer iJian V, ^ 0.2V but 
with off-Slate cunents reduccni Irv nearly ttirec^ orders of 
magnitude. Again, the off-state eiurents shown are for tujnii- 
nal devices and worst-case would be higher. This approach 
IS conceptually easy to implement in any teelmology. i^ftjw- 
ever. it inc revises tlie comijlexity ol both die process and the 
tnrcuit (iesigu. 

Fully tlei.iieted ff-^D) silieon-on4nsulator (SOI) devices have 
been proposed to reduce off-state ciuneni for a given Vf. 
Tliese devices have a sleeper subthreshold slope than con- 
ventional buJk devices, tinis rednctnj^ i^ff-slate cmreut with- 
out increasing V,. Iloweven single-gate Fi) SOI devices are 
dilTu'iilt to scale into die deep sulMnicrometerreginuv Duak 
gate FV SOI dev ices si ale juuth better but are ver>' compli- 
cated to make. Tltese tlifilculties, c^oupled with the materiiil 
quality aiul availability issues^ make the FD SOI device 
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an utilikely camlidate for fiimre generatious of high-speed 
digital technology. 

If [Yo odier sr^lntion for high lt,f( can he ffuind. then V^ camiot 
be scaled lower than a it'iiain point. For example, if tme 
needed to keep l(,j|' (nominal.) at t uMut^^ ttit^t^ V^ (nominal) 
couhi nol go below alK>nt t);J5V. We can apjjly Ihis (o ibe 
OJ-um general ii a T ( T,^x = - "J t^Jtt) and resin mlate ihe device 
with V[ = (f -15V After eomtjact model extraction antl inveiier 
simulations, we find that V^i^i rnnst be increased to LBV to 
get (he same t)erformiiii€e as showi\ in Fig. 10 for the O.l-fiiii 
generation. At \■^l^\ = l.SV and Vj - d.r]5V, the device sinuila- 
tif>ns j^redict a drive currer^t (jf shghtly over 1 niA/uni 
(NMOS). Tlie peak oxide field would be over 7 M\7cm aiid 
tlie peak electron temp e rat me would bt^ about 3300 K at 
Vf] = \'^ - 1,8V (compme to Fig. 7). E\en if we coukl obtain 
this \'eiy high drive ciuTcnt, it is questionable whether such 
a device coidd be created w ith adequate reliability> In any 
case, it is clear from t his discussion that ceasing threshold 
koltage s<*aling would have a cnicial impact on hit are device 
hubnologies. 

Conclusions 

We have explored MOHFET scaling into the future, extrapo- 
lating past .scaling l rends in clumuel length ami gate oxide 
thickness. This scaliug requires ever-sliallower jtmction pro- 
files and. below Lg = 0.15 ^m, retrograde cbaiuiel profiles. 
ConsUuU-rieki seating ajJid led to Vtu\ and V, continues the 
h isi o r i ciil t re 1 1 c ! < i\* alx a u :it f- ' < * i i ti[ > rose n t e n t in gat e t le lay per 
generation. In this era Mt )SFET drive cmrt nt remains nearly 
ctmstiuit from one genenition to the next mul most of the 
performance iniprovement comes from tJie decreasing sup- 
ply \ohage. However, this perfonuance comes at the price 
of ex]Jouenlially increasing oft-state leakage. Possil>le alter- 
natives £o tbisiiroblem were discussed bnefly l>nt no clear 
resolution is available at this time. Clearly^ this is an area 
where the design and tecimology comtnunities must work 
together to develo]) an opfitnal roadmap for futiue de\ice 
scaluvg. 
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Frequency Modulation of System 
Clocks for EMI Reduction 



This paper focuses on clock dithering as an on-chip technique for EMI 
reduction tt is a survey paper based on information gathered from inside 
and outside HPs Integrated Circuit Business Division IICBD), It reviews the 
basic concept, the work that has been done at ICBD and elsewhere, ICBD 
customer experiences, and lessons drawn from these experiences about 
design, effectiveness, and customer implementation with ICBD. 

by Cornells D. Hoek^tra 



Tlie proliferation of eleelronic protimis in tlit; hoiiieimd 
offue is puning intrrasiiiM [in^ssure on eve*0' product to 
reduce it-selernoniagiuMic intertereEife (BMl). A I HP's fnte- 
gratecl Clniiit Business Di\ision (ICBD), several dirferent 
nieUiofis have been used 1o help deal with EMI direerly oii- 
eiiip, mnong rhein freciueney [iiodiilation nf llie system eloek, 
aJso called fUxk dithering, and control (jf pad ( nil put rise 
and fall rioies over process, vollago, antl lempeialure (P\Tl 
variations. Tltis latter method is also called at^jiistable out- 
put pad (ADP) rojitrol, mu\ soTuetimesiiitlutii's prognun- 
niatile atlJiLStmeol \\n eaiweitive loading. 

This paper focuses on clock dithering as an on-ehip tee-li- 
n k\n e ror E M I ret! u c t i oi i . U i e vi e ws t h e b*Ls i r tour ey > t , t h e 
work that hits heeti done at several different K Bt) design 
centers and elsewhere, ICBD customer experiences with 
that work, mid lessons drawn front these expenenees about 
desigJi, effect j\< 'Hess, iiiul eustfjiner Unpteniej Million with 
lt'nr>, Tile pa[»ei'rlos(\s vvitli a brief review of the costs and 
benuljts of implement uig diihering and a sinunuuy of wliat 
customers can expect wlien working with ICBf). This paper 
dr>es not aim in br a cnmprelicnsivc descriptioii (d dilhenng 
circuit r> and madiematies, Init ralher a m<He uaiTMlive 
descriptkin ofexjjeriences atui niles of thumb. See ri:*ference 
1 tor a more detailed dis<mssinn of circitit implemenlatiotL 

IVpicai Clock DilJiering (*ircuil 

The k» y Ulvn, ilhisiraied \fy Fig. I, ij^ the control of the fre- 
quency of the vol tag€*-cont rolled oscillator (VCO) of a phase- 
locked loop by appropriate dhisi(jn of (lu* reference clock 
(RefClk) by tlie input divider f Q) and t)rihe VCO elock (fvco) 
by the feedbark divider (P). The di\1dersidl consist of digital 
ct)un1ers, Tlu^ divided digital wav(*fonns are c{>ni[>ared by 
the fjhase-frequency detector, wliich puts <nit an up or down 
signal puls(* depi'uding <m whether the P wavef<jnn lags or 
h-atis the Q waveform, The widtli of the up or down pulse is 
projjortional to the innoiint of lag oj- lead. The up oi^ down 
pulse is fetl to the efiarge jannp ;md low-pass 11 Iter bloek, 
whi< h translatejs it to a change in the VCO t^ontrol voltage 
(vcml). The VCf ) eonlrol voltage is repc^atedly arUnsred t>y up 
or down pulses until tfie VCO treiinejicy fvco is SMch that the 
P iurd Q wavefontis align (be.* the up tmd down pulses are uT 
tiearly >^ero whith). At this point RefClk/Q ^ tvco/P. Sirue the 
VCXJ frequency is divide<l tiy the caitput divider Dt the ac lual 



output signal PllClk = fvco/D. Thus, tiie out|nj! fretmeney at 
stal)le ojjeration is tUctated by the \'alues of the Q. R and D 
dividers, ami by ai)pro]iriate substitution eai^ be written as 
PIfCfk = P(REfClk)/(QD).TIuis. for example, if RfilClk =- IGMHiC, 
P = 50, Q = 10. and D ^ 5, the output fiequency PllClk is the 
same as tJie input tre*iuency, IG i\lflz. 

If the P counter endpoint is 4tJ. the output fretjueney is 
b^.()SMHj^, 2% less tlum It) MHz. Theiefore. a simple way to 
achie\ e ditliering is to cliange the P coiaiter endpoint liaek 
ami forth between 50 and 49 at some reasonable rate. 
Craitrcjlling this rate Is the job of the M counter that is, the 
P eoimtt^r eiHipoijif is i hangefl each time the modulatitjn 
eotmter eMi reaches i!s ejidpoinb A typical value for IVl might 
be ItJ, so that the modulation fre(.|uency Is then 10 MtIz/(QMj 
= 100 kHz. hi practice, either the Q counter, the P counter, 
or l)oth can be ehartged to aehievc^ different target fre(|uen~ 
cies. Fiulhennore. Ijy usitkg both the risuigauil bdhng edges 
of the VCO c^lock, P can effectively have values suc'h as 50.5 
and llt.n, tims tilkmingasynunetric de\iation of ± l%ab(nit 
a center value of 50. 

The scheme descril>ed abtjvx- ran be thought of as Si|uare 
wave mtidulalicjti bc<'ause the ptiuse-hH-ked Iciop is asked 
ti> jump instantaneously from r>ne Ireijueney to mtother. 
Hecause rt'al systf^ms don t lespond dial way. mid because 
of deliberate filtering tcj moderate thissuddc^n transition, the 
aetnal freiiijeaey niotlulation waveform looks more like a 
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Fig. %, Ui) HABEli siiLuilatifJii of 

I he Ktarlup, lurk, tind ililluM'iitK 
regions of ]tlii)Sf-[i)rkHi ]no|i u|i- 

hiiitni, Tb^ Vt.}( J fNMir<}l voluig*^ 
vcfitJ is pn^KMliunal Ui riu^ outpttl 
Jrcquent^y. (b) (.Uoscup of Uie 

IbniL Nijie the ringing square 
wave rriiiHMifinre, 



ringing stiiiart' wave. TVjiifiil simulation results for stiotiiiJ. 
lock, and modulation for this (iesign, using tht* SADEK 
analog/cii^iliil mixetl-sigrHil simulatioji tool, are shovvn in 
Figs, 2a aiuJ 21), Tht^ VC(J cf>iitrol voltage vcnti ri^preseuts 
fie<|ijeiK\v, up aiui down are as (ii'seriheti above, ami p_sel is 
the output of I he mofUdat.ion clivicier (M). Frequency devi- 
ation of tht^ dithereti clock in ty|)iccLlly ± 1% to ± 2%^ ;uui 
nioilulation fretiut^ncy is t.vpii ally 50 kHz \u 250 kHz. CycJe- 
to-cycle jitter has rtmged from well under 0,5W)to af^ nuich iis 
2'K) tor designs to dale. 

Sqtiare wave tnodulaiion iiii just tiescribed (uls been u^sed 
successfully in a number of prfxiucts to reduce EMI emission 
sufficiently In allow products to pass FCC !estiug vvbert I hey 
otlu^'v^ise ccjuki not. H(twever, in some apidicaticjns, the 
c y c i e-t o-cy c 1 e j i 1 1 er ass \ h ■ i at ed w i i b i b is mod u I a i i o 1 1 w \ et 1 1 m I 
camiot be tolerated by the system (this is discussed further 
belowO- f^er the last year tJiis drawback has been addressed 
at ICBD by the development of triiuigle wave modulation. 
This nielhfitl uses delta-sigma nu'thods to step the pbiise- 
lo( ket! lo(j[) IVeiiueney moregrafhially from a low-frequt'^ucy 
target txj a high'frec]ueucy tari^el iiud bark again, resulting in 
w^hat is usually called friatigle wave frequency modulatiou. 
Tlie techniqui^ greatly reciuces cycle-to-cycle jit (ei of the 
phase-locked loop output clock compared to squaie wave 
frequency mo<lulatiou. It also pro\itles some improvement 
in EMI reduclion because of flattci' spectrtil response lie- 
tween the upper and lower frequently ttirgets. This new 



phase-locked loo]) rU^sign has been successfully implement- 
ed iu ICBD^s (MUSI ITB prot ess oti two ASll^s for HP prod- 
ucts. Tlie new uuxiulation meliiod is less sensiiive to i>ro- 
cess variation than previous inellnxis, ;viid sho\ild nieretbre 
be e cLsy t o j) o rt to fi 1 1 1 1 1 e j > 1 1 >t ' e ss ge 1 1 e i at i o i is an < 1 set ■ n i i<i - 
source fabrication facilities. Fig. 3 shows a closeup of the 
triangle- wave modulation waveform fsf this new flesign 
(si art up is similar to square wave modulatiou). 

HP Experiences with Dithering 

ICBD Customer Divisions. A number of HP pjotiurts liave used 
< lot k ditlieiing t<j date. For<jne product, two diftereni mod- 
iilatina schemes were (ie signed and juauufacnned by two 
iiulepeudenr organizations using different processes. For 
one design and t>rocess the modulation wavefoim looked 
hke a ringing squaie w^ave that suhstimtially overshot the 
larger betjuencies. while for the other design and jjrocess 
the luodulatiim waveform wtLS moie triangular ln^cause of 
its use of a smaller-bandwidth filter. The adviuuage of die 
triangular \Trsion was tliai changes in perioti from cycle to 
cycle wt^re gradual (less eye I e-t o-cy ele period vaiiatiou or 
jil(er). and I he spectnnu was smotJtbly spread iicross many 
frequencies between the miuhtnim aiid tuaximum. Howev- 
er, because the nan ow-band width filler Loop response was 
so slow, for worst case slow conditions the VCO frequency 
never reached its raiget value, luniting total bvqueucy de\i- 
atioii and thus EMI retluction. hi the ringing square wave 
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Fig. 3. f Jms^hip lif ilif inan^e 
vvii vp iut H 1 1 1 la ikn I Wti vt*f orn i . 
SVMriup Mfttl ItH'k ai"(* f^imilar tn 
.S(]i]5ire vviivf Hfanii}> aiKi lock of 



vFmon, tlie lt>r>p rf^sponsf^ was very fast, so that target fre- 
queiiries were reached and ever^ exeeerled betause of over- 
shoot, over all process conditions. For this square wave 
schenie, rhe frequency wvis disTrihiiTed ovei' a wider range, 
although less evenly. Hg. 4 shows simulated Vt X) control 
\'oltage waveforms for these two different designs for quali- 
tative compai'ison. 

rori(hicied KMl ine^^Lsnrementii of ihe freqiuMiry spt^'tniiii of 
the systen^ clock pin sliowed lower peak values for iho 
scjiiare-wavi'-ntodnlaied pail d\an r<ir I he tiimigular-wave- 
mo<lulated part, wlutb appeare<l i(j he a nvsnli of greater 
speet nun spreading because of squai'c wave overshoot. 
However, radiated emissif)ns fnun the hoards using parts 
designed witfi iriangular wave inodnhnion exiiitiiTod less 
noise overall. Although the reason lor tjiis was not proved, it 
ap peme d T a I >e the re s [ 1 1 1 < )f s I o w e r o ve ra 1 1 .s w i 1 1 ' h i n g 
speeds of the process tised to nianufaeture this versus i r>f 
the ilesign. N(^vertlu4ess. for fiart^s from holh taxKVSses, 
(lit h t * ri ug h a< t a he n e fi t ■ i : 1 1 e f f eel i a i 10 M I . Pi i^. r^ si k j ws con - 
ducted tVefiueiK-y spectra for one of I he harunaucs of the 
dilhcneci dork measun»d on the clock ]>in of each pait. 

For the producl descnbed above, the KMl reduction obsf-^ned 
for sfjuajT wave mockilation ditl nol sc*eni to nuilch thc^ 
reduction prcflitie<l inatluMualically by stiOidard FM theor>' 



liased on the deviation anrl the modulation rate. Tliereforp^ 
for another prf»<i net. a dithering phase-iocked loop using 
square wave modulation was made programmable to a niuu- 
her of different deviation and modulation values to make it 
possible to explore EMI reduction based on these two pa- 
rameters. This gave the interesting result thai EMI reduc- 
tion was optimum somewhere betw ecu ver;y' slow and vety 
fast modulation. rontrar>' h> standard FM tlteory^ The rea- 
son b>r this is not r(^ally vc^r> sun>nsin^ and is disettssed 
fun her below (.see "Design Considerations"). 

The t*omplex relations described above between overall EMI 
rediiction and modulation waveshape, proct^ss rharact eristics 
(e.g., intrinsic switching speeds), nu^asuremenl inethcKl 
(e.g., conducted versus riidiaterl spectra), and inoduhitirjii 
rate rausf»d substantial i onfusifui anrt disijgrcement about 
which modulation method was better for EjMI reduction, 
and was a primary stimulus for writing this paper 

Other HP Divisions, Another HP division has taken a differ- 
eul apjnoacii. llus division fknelriped an all digital gate- 
an iiy \) art f I lat i'< * ce i ves ;i 4 ( ) - M 11 /. i n p u 1 re fe re \u e signal ai i d 
oiiljiut.sa cloek thai varies p^seudc/nuidomly m*ar 14 MHz. 
Tbi.s psemloraiulorti UMHzsigniil is then feil to an If con- 
taining a j)hase-locked lo<jp, wliieh smooths the pseudoran- 
tlotn 14-MII/. signal and thus effectively generates a ditliered 
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Fig. 5. Conduct.pd sped. rum oH a diUierPtl tlock hafmonic for (a) 
a square wave modulation design and (b) a triajigle wave niodu la- 
tin n design ► fipectra were filtered througti a CISPRlfi-comfjliajU 
quasipeak deteetor (se£' '*EM1 Measurement Standards" oii piige 
105), 

system dork. Tlie pseudoran(Jf>ni 14-MHz signiil is gi^iieraled 
in a deterministic way tJial makes it exactly 14.31818 MHz 
on average, s(3 that it can be used as a real-time clock. Mod- 
ulation is sjTLchi'onized to the horizontal s>tic signal of the 
video display so that no raiidortt jitter is obscrv^ed in the \ideo 
picture. Fig. 6 illustrates the appearance of the pseudonm- 
dom 14-MHz clock compared to the 40-MH?- input c lock and 
Ute ideal 14.31818'MHi; rlock. Tvvo modulation cycles are 
shown. 

Non-HP Clock Dtthering Products. A standalone product that 
pro%ifies a t hn k whose frettuency vanes very smoothly over 
various ranges of center frequency and deviation is a\^dilable 
in the mdiLstry'. However, ttie product is expettsive> lakes up 
board space, aitd requires additional suiface mount part^ for 
operation, further adding to boaid cost.s. In addiiion. the 
modulation frequency is fixed and cannot be sjiichroniiceil 
with product operation, such as til e horizontal sync si^taJ 
of a video display to prevent \isual distortion due to jitter. 



The recently developed ICBD dithering phase-kicked loops 
described abo^^c offer sniootlily varying frctjuetity modula- 
tion without these disadvajitages, and at very low cost. 

Design Considerations 

EMI Reduction versus Modulation Waveform. A square wave 
can be described as a hnear super] position of fjie odd har- 
monics of a fundanicjital siniLsoid whose frequency is equal 
to the frequency of the square wa\ e. Thus, a lot can be un- 
derstood about square* wave modulation of a square wave by 
considering sqtiare wave modulation of a sine wave. The 
discussion below is given with this in mind. 

F'M theory predictis that the power of a sine wave whose 
frequency is niotlulaiefi by iinother sine wave is distributed 
across individual small jveaks Ijetween tlie mininunn and 
maximum frequency endpoints, seiiarated by a frequency 
difference etiual to the modulation frequency. Ttms, as mod- 
ulation frequency is decreased, there are more pcrwer peaks, 
but ^ith lower peak valites and spaced closer logetiter. On 
the other battd, the spectrum of a sine wave whose frequen- 
cy is modulateti by a square wave contains just two peaks, 
regardless of h(}w slowly the sine wave is modulated. These 
peaks are at the mininuun and maximum frequeticy devi- 
ation points, ajui each contains half tite total power of the 
unmodulated sine wave. Tliis can be intuitively mTderstrjtxi 
by realizing that the ntodulated signal spends virttially all of 
its time stabilized at oiic or tlte other of the two frequency 
endpoints. 

As the [tif>dulation rate is increttsed, a real system designed 
to do square wave modulation cannot actually respond 
instantaneously in true square wave fashion and spends rel- 
atively more time in transition between freqtiencies and less 
tin^e stalulized at its frequency extremes. Thus, the system 
starts to look ntore like a sinusoid ally modulated system, 
and the two square wave powder peaks tend to cUstiibute mto 
mtiltiple smaller peaks. Fmaily, as the modulation rate is 
increased even fiiithcr, the number of peaks will tend to 
decline again tmd their inthvidual peak values wHl mcrease, 
in accordance with FM theory as discussed above. 

In other words, in a real system designed to do square w^ave 
modulation there is a point of maximum EMI reriucrion be- 
tw^een very fast and very slow^ modtilation rates. The exact 
location of this point varies depending on phase-locked loop 
aitd product characteristics, Tlus phenomenon w^as veiified 
for the prod tic t with programn table parameters described 
above. For this product's particular phase-locked loop and 
product characteristics, measttrements showed that the 
greatest redttcl ion occurred at a point where the ratio of 
frequency deviation to modulation frequency was about L4, 
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As tUsc iii^>e<i near the begiiiniiig of I his article, aside from 
reducing eycle-toeyrle jitter. the reeent develfiptneJit of 
triangle wave modulation at ICBD alsf* improves on the EMI 
recturtioti limitations of square wa^e rnodulaiion jiLst {le- 
srribed by niort* smooUily spreading spectral eriergv' across 
the entire range of frequencies l)et ween the maximum and 
nuninium endpoints at low modulation rates. 

PrsgramtnabilitY- It is vahiahie to provide modulalion. devi- 
atirHi. ajid ditjrering oiVoff prognimmabiUty in the phase- 
locke^l IfMjp, lliese features should be easy to c*ontrol during 
both pha*^e-lorked loop test mode and iionnaJ operaiion to 
allow rapid and effective e^^uation of silicon and fo opfunize 
the product's EMI characteristics, Tills kind ot chju^icterlza- 
tion adds knowledge to the phiise-locked loop database, and 
with aii|in>pnate (jrograumiability can also tie nst^fl lo iissess 
l»rr>du( T margin over a range (if fixed operating frei^uenc^ies. 

Frequency Synthesis. Frequeticry synthesis is a buiU-in option 
for dithering phajse-locked loops, Tlie same hasic methcKl 
used lo create fietinencies s]ightl>' smaller or larger than tfie 
reference can t>e nsed lo synthesize neaily Lmy arbiirao' 
frequency within phase'lneked loop perfoniiLuice liniilalifjns. 
This allows the use of lower-frequency co'staiji ( <2{1 iMHz or 
so) aperat irig in fundamental mode to genenite the frequency 
referetK'e. These rr>'stals are topically less expensive, retiuire 
fewer extra components, and cause fewei* ylari up prohlenis 
than higlier- frequency ciystals, which necfl to oi^eraie m 
overtone modes. 

Spectrum Overlap. W heti deciding on de\1arion values, the 
desii^ner should keep in nunti the potential for spectnmi 
overlap at higher liarmonics. This can occur when the oul- 
put freciuent^y is relatively low and tht^ frequency (itniation 
is rchitivcHy \\v^h, and vim lend to cancel out the exijeeled 
EMI rediiclion for hi^li-fre(|iiency components, 

Mfxiiig Dithered and Nondithered Clocks. Dithered and non- 
dil In-red dork domains on die sann^ chip nsualiy must be 
treated as unrelaled clock donutins, atiti Ihcivfrjre shrjuld he 
avoided if possible. ( onsidcM' i\\i^ case uf two i lot k <tomains, 
one dithered and one imdilhered. Given a point wheie the 
rising edges of the two clocks go tip sitnidtaneously. the 
edges f)f the flithereci clock that f(illo%v will alternately lead 
oj^ lag the conesponding edges of the reference clor'k ovc^r 
thne. This Ls sometimes refiMTed t(j as cluck slij). Clock slip 
is both diffrcult to confrol accurately and diffienlt to mea- 
stire accurately, parriciilarly in a prfKhiction i^inlronment. 
Designing a system with asynchronous domains is t\7Jically 
messy, ( (jm[Jlicated, and hard to sintulate. so it should be 
avoitlerl if i)ossible. Forexamtde, in the case of video clocks, 
rather tbim have a nondithered clock lo prevent visual jitter 
m the video, modulation can besyn<'iirnni/.ed to the horizon- 
tal sync signal. t,)n the othei' liaruL if seiiatale clock fiomains 
iue necessary, they can be used — ^they just require more 
<'ar(inl engineering. 

System Simulation. We recommend that our customers simii- 
Uiiv a Ix-liavioral/srruetural Verilog model in their chip de- 
signs to catch nnexpi'cteii jkrohlejus. Examples of jiroijlem 
areas expose<l when .■sinuilaling with sikIi a incjdel are ine 
[iroijer nmhlplexingand I/O control, hiadefjuate t^ad rhive 
strength for hist (system clock spetHi) (nhpnt lest signals, 
a.synchronr)US interfaces, and marginal j^erforinance with 
t espe CI I o [ i]H 'tit t ing frequency. 



Aniixed-st^ial (le.. ai*alog and digital) simulation tool is 
indispensiible for phase-locked loop design and simulalioa. 
The tool should ha\e links to Verilog and i\ and ideally 
should (jffer buih-in FFT anah^is capability, which can be 
useful for evaluation of the spettral clutnicterisi ics of viuious 
dithering alternatives. 

Muhiple Phase*Locked Loops. Careful attention niirst be paid 
hi systems \\ith muliiplc phase-locked h>ops- Some m^^jor 
system IC eDinp<>n*n"iis (e.g., microprocessors ;» contain 
phase-locked loops of their own. used for dnngs such as 
frequency s>ti thesis ;UKi generation of nonoveriappmg 
clocks, Tlvese phase-locked loops must be able to track the 
dithering t>f the reference phase- locked loops out}iut ade- 
quately so i hat clock skew does not become excessive 
acniss tlu^ systenr rjifoHunately, assessing this phase- 
locked loop tracking abiUty in sinrnlation can be difficult. 
A dithering phase-locked loop with very tow eycle-to-cycle 
jitter fie,, very slowly changing freqnency) can help avoid 
the need for this simuhition. and is a major reason for 
ICBD's developtnent of triangular moflnlatioii. 

Product Evaluation ~ 

Silicon Process Variation. Prijcess st^eed can be an impojlant 
factor Lifft*ctiiig du- apparent effectiveness of ditliering. F'or 
cunent designs, process speed has a fairly strong effect on 
VCO gain as w ell as clock tree mid output drixer s\^ itcluiig 
speed. Thus. Ave reconm^end that our customers make a 
point of looking at both fast and slow paris when eyiiluating 
the efTeciiveuc^ss of blithering, 

EMI Measuremeitt Statidards. The t ISPRIG EMI njeasiirement 
standai'd is not absolute, and different measurement tfXiLs 
may all meet the stanrlatrl yet give differc^nt results. The 
medrod by wiucli EMI eniissiunsare tci be measured is 
dpfined by a stjmdard called ClSPRKi-l." This standard is 
intended to approximate the characteristics of typical radio- 
fiv<tiiency rec^eiveni. For frequencies from ^10 MYl/. tot tillz, 
power is averaged for a t»assband whose nomhial width is 
12(} kHz at f] (ILl However, the standard allow.s passbands 
ranging from 100 kHz to 140 kUznl ti dB. so rt^sutls vmy 
flepending on the rueasnremenl filter chosen. Teak values 
that change at a rate faster than 10 to 20 kffz are ignored 
by using n qudsipeak detectrjr defined by I lie staiulard. The 
time-cons[arn characteristics of the quasipcak delect or are 
again given as a range of allow^able values. This amt)iguit;y 
iji both litter arid peak detector chai act eristics means one 
should be carefnl wlu'u comparing EMI mea.suremenrs from 
different tests. 

Conducted versus Radiated Spectra. It is inipoitant to differ- 
(iiliaie Ik t ween eondne[ed Miid radiated st>eclra. When a 
probe is directly touched lo tlie (ioek pin of a part, tlie con- 
ducted spectnnn observed is a fairly direct representation of 
the sjjectral con^t>osition of the (lock signal. lIowe%^er* wlien 
electromagnetic emissions are monitored at a di,s!an<"e from 
a finished product, tht^ cio<*k signal iias been signs fit ant ly 
"filtered" by the antenna eharaeieristles of the product and 
die nuusurement emironment. In other words, proflurts act 
as freqnency welt^ctive mncnnas, so conducted imd radiiru^d 
spectra (an In* and usually are tjuite different. This increiises 
the (iesirability of pliase-locked loop jirogrammability to fuid 
optitntnn perfonuance. Consequently, this also mtdcesease 
fjfconlrol lability and access for nieasurenient Uriportani. 
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Effectiveness as a Function of Frequency, fiithoring is ininnsi- 
viilly inr>r4' efftHlivp kI highn- haniiniiiis um\ lass effective 
at lower haniioiiifs. This is simply b(^f*aij-se the ahsohite 
valup of frpqupiu ■>' dexiation incTPaijCs liitearly witJi haitiiniiif 
nimibei; si> thai s|>tM Iral entTgy is* sjjn^ail ovpi a laigt^r vimgp 
at higher hamnniics. while the width ofUu' filler owj' wfiicli 
spet trai eu*'rgy is measiired is fixetj. Fortunately, higfi fre- 
quency is exactly where certain eustoniei^ have their itkksI 
sever*?" problems. For example, one KP |>nnter division lends 
to have many cnmponents on !he piinter hoarti, m\(\ relies 
heavily t>n shielfling lo limit emission^i. Lowdreipiency noise 
seems \u he elTectively contnined, hul high-fretjueiicy (sliori 
wavelength) noise tends to leak out thrtjugh openings irt five 
shielded hrjx. Foi'aiKjrhei HP printer division, on the otjier 
hand, lovv-fjeqnetuy railiali^m lunis out lo he the priniiuy 
noise source, at least in part because of unique resonani 
conditions created by printer cabling, with the resuli thai 
dithering is less effedivc. 

Dithering ^rersus PVT/AOR Tin- PVT or AOP technique ( de- 
fined at du» l>egiiiiiing of this iirticie) c{msists of coat rolling 
the tum-on limes <jr ihe rise and fall times of IC output 
drivers (pa<ls), ideally keeping these times constant over 
PVT variations, and sonnet imes adjusting for capacjtive k^ad 
as welt This methtnl is also intended in keep ground iKJiiitce 
ami signal rellectifjns cunstaru o\er PVT vajlat ions. It re- 
quires circuitiy 10 monitor the PVT operating point of the It' 
(for example, by conntrng cycles of a free-ninmng ring oscil- 
lator with respect to die reference clock), antl then acljiisdng 
the driver or ]:iiednver canerk! lo control the drive strength 
or lurii-on limt^ of Ihe driver, restjeclively This technitjue 
requires considerable effoil for [jad design, customer simu- 
lation, and characterization of first silicon to accirrately cor- 
r el at e the P\T tv t ert^ n ce t o pat J | ) ro g ran 1 n 1 i n £?. With t h e j j rat - 
ti(e of seer jnd-sourcing becoming more conunon, much of 
this work has to he repeated fcjr each fount liy tlufVirlunatc- 
ly, EMI red I It lion Ivas been negligible, at least as obser\'t^ti 
for HP f^eskJel printers, a ma^jor user of this method, h'or 
these prodnrts, sysiein dtick mase has turned out to ht^ a 
much greater somte of FMI than pad switching noise, mid 
system clock noisc^ is not helped l)y this method. In fact, the 
U>w tmtpnt rt^istmit^e of typical r*VTi>a(!s may encourage 
tiu' iransmission of systt^m ckick noise Tnjm Ihe power net 
out through ilie otUput drivers and (mttj I lie product hoard. 
In surmuaiy, P\T or AtJP is still beliext^d to have potential 
benefit, especially for vpvy fast -switching outputs, but is not 
likely to realise its pf>tential imtil diive conlrol is fairly auto- 
matic inside eaeh |)ad, without retjuiring cliipdevel program- 
ming inters en I ion. 

Dithering addresses most of the limitations of P\T. The sys- 
tem clock tends to be the top ntjise sourt e because by de- 
sign everything happens at rising ar^d fading edges of the 
clock. This means that ckick dithering will lend to spreatl 
out all sounes of noise tlnoughont the systent since all of 
them me related lo the clock. Thus, the advantage of dither- 
mg circuitiy Ls that it is essenti^illy a single independent block 



that can he inserit^tl into an ic; design to help recJuce EMI 
gloljally. at botli thip and board levels. 

Verifying Testability and Compatibility. Cu^totners need to set 
asitle engineering time to design Iheir K' to make the phase- 
k)cked loop accessible foi- test ing anti to ensme that the IC 
will work with a dithered rlock. In [jrotiuction testing the 
phasedocked lof>i> internals art^ typically \QHlcd while die 
IC is in a s]3t^cia! i>hast^dt)cked loop test motlc. Dui ing iJiis 
mode certain pins of the iC are iiniltiplexed to the i)hase- 
locked loop Idock's intuit and tmtpul poiis for {iirett actress 
tju a protint tit^n Ulster. Spec ial lesl <Iecks writt(*n by ICBU 
m-v then applit^ti kj the patM. Tlte t usttHner needs to tlesign 
the I(^ to acconunoilate this pluise-locked loop test rnotle 
configuration. The customer also is advised to simulate the 
I( ■ design at the extremes oi frtxiuency ex[)ected from the 
tUthered clock, rind la inclutie inu ertainty in the clock etige 
to accfHint for cycled tvcycle jitter. Finally, recall the earlier 
recomnienttatitjn that if rlithering is used it should be applied 
lo Ihe entire ck>ck domain. If that is not possible, customer 
tTfod will bt* retjiiiietl to ck^sign iLsynchront>us itnerfai^es 
Ihal do not n^y tin controlled phase notations between the 
tiithered and nondithered clock domains. 

Customer Evaluation. At this point in Ihe evoUition t>f clock 
ilithering, t iistttiuers slionld plan in s|it;TKi stnne extra time 
beyond tJie usual EMI characterization f»f their prtiducE lo 
characterize their systems with ditliering. As expt^nt^jue 
is gained hot h by customers and IC'BUj lids need should 
decline. 

A c kn o wl edgm e 11 ts 

This is asurvtrW 1 1 ape r leased on inFtirniatifm gathered frtmi 
many different petjple insitle and outside ICBD. People within 
ICBD who [jrovitiet I inforniati()Ji were MikeOsliima, Tcmi 
Tliatcher, antl DougSojomiier iVnm Caliromia Design Cent(T, 
Charles Moore and Derek Knee from Foi t C-ollins Design 
Center, and Linda Engelbrecht from Corvallis t)esign Center 
The development of triangle wave modidalion was by a 
team headed by Lintia Engelbrecht, using a coitcepi origi- 
naterl by Charlt^s Moore. I have also obt«ijjietl inhirmation 
froni Btjh Dtx^key mid Ron Juve of Vancouver Division, Ti>m 
Wheless of Business LaserJet Division. Steve Smith ofSiin 
Diego PririK^r l)i\ ision. and Dave /\mett and Bob Piickerte of 
Mobile Co]u|}uting Division, 1 wtjuld like to take this oppor- 
tunity lo sincerely acknowledge ami thank each of tht^sc 
peo[)le tiere Itjr their t^ontritmtion tjt infonnation and com- 
ments for this patter. 
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Fully Synthesizable Microprocessor 
Core via HDL Porting 



Microprocessors integrated in superchips have traditionally been ported 
from third-party processor vendors via artwork, A new methodology uses 
hardware description language (HDL) instead of artwork. Having the HDL 
source allows the processor design to be optinnized for HP's process in 
much the same way as other top-down designs, 

by Jim J, Lin 



Tiie level of integration has been rapictly increasing with 
advaiicen in sernicontlutior tet^linoiogy. Many HP design 
groups have t*apitalij5ed on thi.s capaijility to t mare hij^hiy 
integraieti ASICs, or -Siiperr hi|>si, Supeidiips tliai integrate 
convent if >na] ASIC togie, ini(ro])rofessors. emhertdeci RAM 
aiifi ROM, aiul oilier niegaeell rimtlions save cost, power. 
iHjajfl spaee. ancl inventory overhead and increase I/O per- 
fonnance a! the same time. This industry-wide trend lias put 
an inc reased bmclen on ASIC sn(>pliers to come up with 
megaceJls that arc rjf ft he-shelf, proven, and testable. As an 
ASIC suppiit^r [<:> other HP divisions, wc* at HP's Integrated 
Circuit Business Division (ICBD) licensed several early 
iiilcn>]in>cesst>rs (haf customers fJcnuiiuled and attwork' 
pnjietl dieni into our process. All snpen liips huill a! ICl^lJ 
today coniain artwork-porfed tnicroproc*essors. 

However, ail work potting Iuls its hniilaiiiais. It often docs 
i\ot yielfl ilu* best aiea possil)le in a given teclijKilog,y. In 
addition, tJie process technok)gies of processor vendors 
tend to diverge from ICBDs leclitiolog>' for the next genera- 
tion. The li^slability nf these processors is also a problem in 
a snperehip because they require access to their functional 
[lins to run parallel vectors. Multiplexing the processor i>ins 
wilJt ASK* pins is a level of complicatitjn preferably avoided 
Customers alst> dentand sninr* cor\fr<illability rif the exact 
microprocessor config lira I ion tlial gries iiiUj the supeiclup. 
Ffjr example, customers may choose tr> inciTa^se or tiecrease 
the size of the cache tliat's imimled witit the microprocessor 
after profiling target code undtT dilTet ent caelu^ conngnra- 
lions. Making such changes at the at1\\()rk \vvv\ is a tiu\]or 
uiuleiiaking and often requires a lot of work for the jHoces- 
st>r vendor as well Presilicon veriricaiion is also vtrttially 
tionexistent and the design often requires several mask 
Tnnis. Piriidly, I he [trocessor is teiinuHogy fjependeni and 
rfrMtuires alinosi as ttiucJi effort to go into a different proces,s 
as I be ortgin^il port. 

One methodology that sncH-essfnlly addresses these issues is 
UDh flfardwan^ Description Language) synttiesis. A number 
t)f utulerlying leclm<ilogjes n^ake (his metbodology work. 
TIk* increased flcpsHy in our sOuidard c<'ll technology alhiws 
tJte inipletnentation of dertse daia path fnnciioiLs iuul is area- 
effective in general IC Jill's standard cell tlesign flow for 
top-ciown design niethoflokjgy is robust ajid mature. Synthe- 
sis is t>eci>rning mon^atid more powptfnl mid is cHtJrtble nf 
highly complex designs. ICBD's RAM getieraiion i.s also a 



key component that delivei^ good perfonnance for cacfve 
applications. Proces^sor vendors in the emhi^d(ied market 
have shifted their design parachgnts as welf They are no 
longer custom designing ever>ihing and are using HDL at id 
synthesis more and more. 

The nit^rliodoiogy of porting cores using HDL synthesis in- 
cortif>rjiles mi existing standard tool flow. This is a ni^ijor 
advmitage. Ati efficient totjl flow^is essential in (odays ASIC 
market. Significant effort has gone into makiiig ICBD's as 
efficient as possible to handle the high-integnition market. 
The goal for tins mettiodolog>' is to le\ erage the st an dfu'd 
tool flow as much as possible. In essence, (irocessor HDL 
is trmisferred from ttie processor vendor. Afier necessary' 
changes in certain cfmOgnraliotis at the HDL level Ihe HDL 
is verified through binciinnal siiuulation. U is ttieit fed mto 
thesynlhesis tool lo <jbt;iin a net list tlial is snbsefjuently fed 
into Ihe t*onvenl ional tcjol fiow. 

Tlie resf of this paper will discuss in more detail the method- 
ology and how ir was used to impleniettt the Coidftre 5202 
ntirroprocessor fnmi Mt>toroIa in a li'st chip, 

Metliodolog> Overview 

Since lite pnn cssi>r ctnes developed frcnn this methodoh>g.v 
are going to be nsed in snperchtps, they neeci to be draigtUMj 
with ease tjf Integration and customer needs in mincL Test- 
ability, c usltjniizabiliiy, technology independence, minimum 
tlie size, and tlujrongji iiresilicon verification are all goals 
that are imi>ortaiit U) delivering a smx-essfiti core for super- 
integration. 

The testability of a processor core has different constraints 
than a standalone processor because the fiinr tional pins are 
not visiljie when the corr is integral ed on a supercluj>. Pins 
of processors have trariilnHially been nmkiiilexetl (Uil in lost 
mode, which takes additional effort, and fan 1 1 grading tlie 
functional vectors is not always easy. Oiu' new mci hr>t1olog.y 
uses fnll-sCfin test |iatterns that rei^nir<^ only i\ few scan 
polls that are neederl anyway for other .ASIC kjgic aiul can 
be effectively fault graded to detennine llie quality of the 
vectors. This approach to testing is compatible wilb IIP test- 
ing standards aiKl minimizes live cost of lesling, 

1 lav i tig t lu 1 1 DL for die processor means dial changes to the 
processor can l>e fione at the source level rather liiati (he 
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artwork levt^l. While rhiuigps to the iji^iiiiH tion set architpc- 
tiire are nuiiLmial aiici are no) rec oninieiulcd by the proces- 
sor vendor, changes to f Jie t-aclie and bus cooll^urations ai'e 
withm the realm of [lossibility, C'ustoniers can often save 
area by rutting oul aJi unused blot^k on Ihc^ processor or 
reducing the raclie siK(\ Tlie runtfionsHity of the flesigJi cati 
be veiiflefi before sillcun to bolster conlldeoce lor lirst-tinie 
success in tf le custoioized processor. 

Our methocltjlogy also enables teetniology independence 
through logic remapping. Tbe HDL can be simply recom- 
piled to target a diffc^rc^nt rec bjifjlogy, assuming that the 
lechnolog^v has a stimcUuYi cell librai*>^ capable of s>ai1 besis. 
This 15 iRie for poitiiig a processoi' to a new technology at 
ICBD, a second source, or a dual source. 

Area is a very importajit consideration as well. If tJie area of 
the synthesized core is not competitive with custoni-laid-oui 
processors, customers will not adopt this strategy regardless 
of how good (he melbodology St^mdard ceil deiishy has 
increased to the point where even data path blocks like 
ALU, barrel ishifter, and register files can be implemented 
in a reasonable amoimt of ai-ea. Mother flexibility in using 
standaid cells is that a processor core can be conn)iled witli 
the desired largei freguency. If the nominal frcxiucmcy is 
faster tiian lite target, cells can he sized down to save somr 
additional area. 

Having the HDL source for the pnx*essors also means tliat 
the design can lie simulated, l>olh at the ETL (Register 
Transfer Languag^v) level initially atul al the gate level at the 
end. Vectoi-s c^m bc^ nm to verity I tie fimctionaiity and timing 
of the desiijfn. 

To ensure functionality, three tUfferent appitjaches cati be 
used, either alone or m con^bination. RTL and netlisf verifica- 
tion can nm precaptured vectors from th(^ processor vejidor 
These vectors are diagnostics, benchmarks, or random lo- 
st nictions that processor veiKlors ihen^selves use. The net- 
list can also he compared with the vendor's c!esigii using 
formal verification methods. Lastly, an environmeiit in 
which r^mdum instructions are generated vh\) be set up 
locally t<j suljject the tiesign to new random Insl ruction 
testing. 

Timing is verified with a coml>ination of static and djTianiic 
timing analysis, ^\n even greater aflvantage for the customer 
is the alnlily for the entire superchip to be simulatetl in a 
timing-accm'ate fashion suice the processor^ is in the same 
library as the ASIC logic. Pre\1ousIy, such sysl em-level simu- 
lation w*as only possible using a hardware modeler or a soft- 
ware model tliat was not timing-acc urate. 

Design Flow 

Ail KB I.) design automation groni) has a scries of suppcjrted 
design tool flows, A mociified version of their Standarci Tool 
Flow^ 2 (STF2) is used to catyy out our methodology, hi this 
way, om" methodology leverages a proven nietbodtslogy for 
doing lIE>L-based design. Synthesizing microprocessor cores 
becomes an extension of the current capability 

This paper focuses only on those aspects of the metliodologj- 
that are unique to porting nuci-oprocessor cores. The process 
is illustrated in Pig, 1. The methodology ineon>orates stan- 
daid ICBU tool flows. For Verilog-based designs, die STP2 is 



used. For \ HDL fVeiy High-Speed huegraied Chcuit Hanf 
ware Desc ription l.anguage) designSt an 111' proprietiuy 
VHDL tf jol flow is usiHl. These flows are simply encapsu- 
lated as design processes in Fig. L 

Itiputs. The processor vendor needs to have the follcnving list 
of items to feed into our tool flow: 

• Design Specifications. Timing, func tionality and pin de- 
scriptir^us of the processor Most of fliLs inforniatIt:jn can be 
obtaijieri from a data book if a\^ab!e. For newer cores a 
data Ijfjok may not t>e available, but internal docun^entation 
dial will e\entuany be part of llu^ data i>ook will .stiffK e. 

• Behavioral Mf)L. A Veiilfjgor VllDL modc>l of tiie [aoressor 
core. This HDL mociel does not necessarily have to Ite syn- 
thesizable. As long as it models the cycle-to-eye le behavior 
of the (le^ign. it can he made synthesizable with sohk' 
rewriiittgof the HDL 

• Fujict ional Vectors. Verification vectors in Verilog or VHDL 
format. These are nm on them resppc'tive simulators, Tfiey 

c an also be trj;mslated so that they can be run on tbe 1 esters 
a>i well. Tires e vectors enalile presihcon verification. 

• Syntlu'sis Scripts {optional). Tliese are ased lo synthesize 
the HDL into standard cells. Tlrey ai"e only available ft>r 
cores that have been previously synthesized. 

Outputs. For use in sutierchip integration mid pioduct protn- 
lyping, the ICBD CVV team provides tiesign groui>s and 
ciist o mers wi t h 1 h e to 1 1 o vv in g: 

• MegaeelL Processor coie with all lequirements for inclusion 
in the HP intellectual property library. Deliverables include 
EHS, ^ale-level net list, behasioral model, data sheet, etc. 

• Tesi Chip liata. Mask, |)ackaging, test vectors, and test pro- 
gram input for test chip fataical hjn and test, A cure without 
a lest ehip will not have this. 

Rewrite HOLIor SvRthesis. The HDL tliat is transferred from 
the processor vendor may not be syutJiesizable since it may 
have been wiltlen only as a itiodel, not as a s>iithesis source. 
Current synthesis tools liave iinutations on the type of HDL 
constnicts allowed and yiehl very i>o<>r-ciualtty circuits for 
HDL not written with synthesis in mind. ICBU lias a .set of 
HDL coding guidelines that need to he followed when re- 
wiiting the HDL. Tliis step may warrant some iterations in 
syiithesis to explore the ot>timal mapping of .si)e(iflc behav- 
ioral const rncts. Regardless of wiiat changes are made to 
the HDL 01' t be cfuality of s>Tithesis aclueved, the changes 
should not alter the func*tionality In fact, any chmrges should 
be thoroughly verified i)y nrnning regression vectors as 
discussed next. 

Behavioral Verification. Even though beha%1oraJ veriOcation 
is a pai1 of STP^i, this j portion of the flow focuses on tlie 
extra verification that results from reeoding for either cus- 
tomlzai ion or sviUht^sizability. Tlie veiifieahon is an extensive 
smuiiation of f lie altered IIDL code with the vendor-provided 
vectors. In tbe future, this step may he augnumted l>y formal 
verification. Tliis subject wih be revisited when the veiitlca- 
tion of the test chip (discussed later) is atialjf\Ked. 

Create and Madifv Synthesis Scripts. The purpose of this task 
is to create Synopsys Design Compiler scripts that can be 
used to compile tlie standard cell ]>onion of the core consis- 
tently and systematically. As meiitioned earlier, there may 
not be existuig sjTithesis scripts if the processor has never 
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been synThpsizerl beforo In othor rasps, thrro %vill \w a full 
suite of scripts ;ilong wiih <iosign fojistraints. In still other 
desigiis, portions of the c ontrol iogic will have synthesis 
stTipts and the data path will not. TliiH represents the design 
met hodn logy of many proressor vendcjrs. 

In case synthesis seripts need In he ereated, flesign specifi- 
cations and the actual HDL are die best soiirrey. R'BD has a 
generic synthesis seripl that can be used as a template to 
help cmate these scripts. Kven when ail of the scripts exist, 
morlificadons aie ofleri needed to make them work in ICBOs 
environment and libraiies. riie [Jrof'<\ssor vendor's apjjroach 
to synthesis may not he the mosi eflu ient approach in IC BDs 
technolog>^ Fiiilhermore, t he vendor s approach may nol 
use the most up-to-date synthesis teehni(]ue available in the 
latest release of the syinhesLs lont Many trial-and-error runs 
of tiifferent approaches may be neetled to determine the best 
synthesis approach. 

Create Quicktest Specification and PADLQC File. Quiektesi gen- 
et ales a test template given a taigel lesH^r, 11 le PADLOCK 
(pad location) file is an input to Qnicktest and other tools 
sticti as the router. This step is only needed if a test chip is 
plaryied For the particular' processor core. The nec^d for a 
test chip Ls determinefl on a core-hy-core basis. Thi* first 
core in an architecture, a core with miyor customization, 
and a eustimier need for prototyping me alt reiisons for a 
test chip. 

The Quicktest specification is used \n generate a test pro- 
gram foi' t he* processor cm e ( es ( t ■ 1 1 i p . 11 1 e Q u i ckt es t s] )eci fi - 
cation documentation describes how ttj create the Quicktest 
file from the design specification. The PADl/.K^ file is used 
to place [hv test chij) pads aronnd the core logic. It c omains 
placement data for the pad ring. The PADLOf spefifiration 
docnnient describes how to create a PADLCX' file from the 
design specification. 

Identify and Create Custom Modules. Not all blocks in a micro- 
processor can he implemtnilefl as standard cells, although 
the list fifsuch blocks is becoming shorter and shorter This 
task idei^tifies all blocks within the core thai shoidd h(^ irn- 
piemen led as custom logic and creates the id en titled blocks 
along with aO the models and information required tc^ use 
them in the downstream standard cell design metliodology. 

To identify hloeks that should be custom, the HDL and 
design specification must be studied akmg with the design 
goals, Blf>cks are custom-designed to rueet area or tjerfor- 
niance goals unachievable with standard cells, 'f^pically 
these blocks will be limited to memory arrays. However, 
they may also in t hide stnictured data path logic* for iniple- 
mpnting highly regular or speed-critical rircnils. snrh us 
a large nniltiporl register file, multiplier, or barrel .shifter. 
This steij may involve feasibility studies m wiiich candidate 
blocks me paitially designed or esfimated for both standard 
cell anfl r'ustom implementations. The following items must 
be produced for each block selected for custom impiemen- 
taiion: 

• Verilog or VTIDL model with limlrig 

• S,ynopsys I imttig with pin timing 

• C'elfl LEF file (CellO is an automatic place-and-roiife tool 
from (Cadence) 

• Artwoj k database 



• Sunrise (an automatic lest vector generator from Sunrise) 
or ATfi (i-m in-house HP ttxil similar to Sunrise j fault model. 

Translate Vectors. By translating simulation4»ast^d %^eritlcation 
vectors into Guide format, the standard imth to testers and 
later portions of the tool flow is established, (tuide is an 
in-house HP tester indeijendem vector translation took This 
step may require diat custom |irogranis or scrijtts lie written 
and suppoj'led to translate Irom unknown formats. There- 
fore, It is advisable to riniirirc^ the jirocessor vendor to supply 
vectors in some knrjwn ffinnat like Verikrg, ffjr which there Ls 
a f leai' path to (iuid*^ The vectf ^rs iirv neerled I'or lunclional 
an<l djaguosfk' ptnposes only. Mamiractnring test will not 
run diese vectors and will rely on I,j(jjj, siuek-at, and at-speed 
scan testing. 

Execute Standard Tool Flow 2 (STF2). As mentioned above, 
StandHift Tool Flow 2 is ii design flow sujij sorted by an ICBD 
design anloniiition group. It includes Verilog simulation, 
S^iiopsys .synthesis, ('ell3 jilat e and routo, ATC? and Sunrise 
full-scan resting, ariwork and mask generation, and Quick- 
test and tiuide test program and vector creation. There is 
extensive docnmeination on the entire tool tknv. A variation 
of STP2 is STF-J. which supjiotts paitial-scai> testing. This 
may he an option for cores that woulci realize significant 
area savings from it witJioin the loss of ajipreciahle tpst cov- 
enrage. Tlie MlfJL tool flov^' is derived ftom an HP i)ro|}rietar>' 
MIDLdi^sign fiow. So fai; no core has bei^n flevelo|>efl using 
I his flesigu flow. 

Package Core as MegaceH, The final step in Ihe process is to 
create the data necessary to offer the core as a megacell to 
IIP customers mid ICBD design centers. The reqnirements 
for this iy\}v of proilncr are cnrrenily being tlpfiued. 'the 
release of any core will adhere to the standards set uij ajul 
provide all the models, docmnentation, smd suppoil retinired 

Test Chip Experience 

The Cold fire test chip (Pig. 2) is a lest chip imp I em en ling 
the C oldlhe 5202 jirocpssor trom Motorola, Coldfire is a 
new line of embedcied microprcjcessors that iritprtives [>er- 
fonutmce over the (J8000 architecture while maintaining 
c r ) I u pa ti b j 1 i t y wi t h most o f Ui e 6800(1 histni ctl on set and 
I n i u ] n ] i z i ! I g ;i rea . Th e C ' o 1 dfire 52 IJ2 has a 2 K-byt e 4-way set - 
associative cache, a debug unit, and JTAG capability (IEEE 
1 14^)1 btjundary scan test capabihty) along with the core as 
j ni I J len le 1 1\ et I I >> M o t ovo I a . 

Design TVansfer 

The Coldfire test chip team received a brief course on tiie 
architectm'e and a tajie of the IIDL source for the Cohlfire 
5202. The ta[>e contained s>mthesizahle HI JL IVn' evei> lilock 
in the design, Tlie cache memoiy blocks mid some tag main- 
tenance logic w^ere custom-designed at Mratorolaand had 
only a behavioral representation. Each svnthe sizable block 
also had a t onstraiui file used in Synopsys. There w^as also 
a top-level synthesis script. This tepresented all that was 
needed to get staried with the port. Subsequently. Motorola 
has sent test vectors atid field t^d questions from ICBD, Tl?e 
level of support at MoicnolabiLS been veiy good. 
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Fig, 2. The Colflfirfi te.sr diip flour plan. 

MaJdng the Coldfire Ibst Chip 

The Coldfire test chip has been targeted as a test vehicle for 
tills nieliiodoiogy ajiti offers early prototyi^es for customers 
interested m using the Coldfire 5202, Tlie Coldfire test cliip 
is iniplenieuteci in a 0.5-an^ technology and takes ad^^antage 
of the process's increased deiisity. A fairly high freiiuency 
of 50 Mllz was targeted to show the scalability of the 
methodoiogy. 

CustQiTi Modules. The team knew at the outset that Motorola 
had designed cusinm • arhe RAMfi and tag k>gic and that 
custom designing * ac lu\s for every core woukl no! fit into 
our meili<Kl<>l<)g>. The generated WEST SRi\iVl available 
from aji HP KAJV1 group provided I he solution. These KAMs 
are designed for ASIC integration and good perfonn<yice. 
However, to use these RAMs, there were two extra require- 
nients that were specific to tJUs cache. First, it had to he 
byte-writable, that is. each byte c*f a multibytp word tnnst be 
vmtten separately. The second requirenierU is that tlie bits 
of one cohumi needed to be reset in one cycle. Tiiis is used 
to invalidate the cache valid bits at startup or in case of an 
invalidate instruction. Tlic RAM group took the first request 
and incor})o rated b>te writability [actually bit wri lability as 
implemented in tlie artwork) into the generator. The one- 
cycle invalidate w^as nol changed in the generator histead. 
the valid bits were turned into fiip-fiops on tliis test chip for 
schedule reasons. This incurred an area penalty and will 
hopetiilly be fixed in the R^\M in the future. 

Rewrite HDL The interface to tlie cache RAM was asynchro- 
nous in tlu^ Motorola implenienlalioiL The WEST SEAM is 
syiuiuronous. This means that addresses, datjL, and tUher 
control signals need to be set up before and held until iifter 
the rising edge of the clock. Motorola latches the vaiious 
signals <m the rising edge of die clock before feeding them 
to the RAM. With this scheme, the signals would have 
missed dic^ sc>tup lime on tlie WEST SRAM, If no latch is 
iLsed, tiien tiie hold linie caiuujt be met. As a result, negative- 
level-sensitive latches are used to provide the necessary 



hold requirement. The control signals that Motorola uses are 
sufficient to generate the WEST SR.\M control signals. The 
4-way set-associatively can be easily implemenieci as four 
data RAMs and four tag RAMs, each representing one way 
in the cache organization. 

Another change made to save some area was the removal 
of the JTAG block. The JTAG methodology does not fit into 
the supeilntegration process since this process has iEs owti 

mechanisnt of doing botmdar>' testing. The .ITAG logic is a 
fairly modular block that can be stripped out witii minimal 
perturbation to tiie rest of the design. The necessary logic 
needcKi in place of the JTAG block is encapsulated in its own 
le\'el of hierarchy. The necessary JTAG-hke fmictionaiity 
has been replaced by a scan wrapper that better fits ICBD's 
superchip test metliodology. 

One final change is in \he area of testing, Motorola uses a test 
mode called ad-hcK* mode to test the cache RAM circuitry* 
ICBD use.s BIST (built-in self-test) instead. As a result, the 
logic that makes the cache RAM controllable and obser\'abIe 
and interlocks the t)ii>eUiie us no longer needeti By rendering 
the ad-hoc mode inactive, all logic associated with this test 
mode ctui be nuninuzed. 

Synthesis Script Modification. The synthesis script that came 
from Motorola i>ut higl\ly tietailed constraints on eacli block. 
The reasoits are twofold. Mt»tor*>la was concerned wtth tiie 
speed of the synthesis job if ttte entire design were compiled 
at once and wanted to be able t<j do most of tiie comi)ilation 
at the block level. Secondly, Motorola had good ideas about 
where Synopsys should be spending its time and wanted to 
influence the tool in that diiecdon. A makefile generator was 
used to piece toget her the nunterous block-level compiles 
and their respective constraint fdes. This approach does not 
necessarily yield the best design in ICHUs technology, as 
was Unin(\ dmmg initial synthesis trials using Motorola's 
scri(>f The block-level constraints were often unrealistic 
and Synopsys was spending optimization cycles on the 
wrong circuits. As a result, the synthesis scripts were over- 
hauled to put constraints only at the top level, that Ls. the I/O 
specifications t>f the chip. 

A hierarchical compile at the top level replaced the block- 
level corr^pile. As a result, Syiu.>psys has more freedom iii 
paititicHiing the time. The compile lime is long but not intol- 
erable. The entire sytithesis job from reading in HDL to out- 
putting an optimi/.efi netlisi at 50 MHi takes 48 hours on an 
HPfK)0O Model 75r> server. For fast tuniairmnd Jieetjs, such 
as testing a quick fix of a bug, block-level constraints 
obtained from bierarchicid ciiaraclerization and tJie write 
script of the previous synthesis run can be ased to compile 
at the block level. To get even better area> the entire design 
has been compiled with the hierarchy flattened so that inter- 
block optimization can be performed during syntiiesis. 

Verification. All tiu* changes mentioned earlier have been 
simulated extensively to make sure that the desired func- 
tionality is achieved without having broken some other part 
of the design. Once the fimctionality is deiennined, then the 
HDL is synthesized to otjtain a netiist from wliich botli static 
timirig find dynamic timing analysis are done. The niE^jority 
of the i^mphasis in timing verification centers on static tim- 
ing analysis. Synopsys' timing ^uialyzer is used to generate 
timing reports. False paths aitd multicycle paths have been 



August 19^J7 1 k' wlGt:-Parkart1 Joum a9 111 



)Copr. 1949-1998 Hewlett-Packard Co. 



caiTiiilly reii<?wefl to make siiro that thpif j.s* no t^scaped 
path in tjie repoji. Bolh tijaximuin imd miruniiiui paths are 
rapoited to expose possible setup ajid hold violations- 

The vet'tor.s rim on die design iJiclude benchmark, diagnos- 
tic, and RlS vedors fVoni Mi>U)rnla, Motorola has cleveloi)ed 
asopliLstic aietl raiKJniu ihsIjik lion set^iience fRIS) genera- 
tor thai am he liined lo gentMaU^ hislnKlioiis in ^oi aiea of 
interest along with rantloiii inieinipis and «^xceplions to 
perturb the processor In future Coldfiie cores, the ahiliiy to 
generate RIS vec*tors will l>e ineorpoj^ated into live verifica- 
tion process. This time. Motorola has getierafetl all the RIS 
vectors and sent them Ui ICBD. Verificatifin using a more 
formal method of binary dec ision diagram coin[)arison has 
iilso beeji pursued using Motorola's in-house tool. This step 
vviD not be available for eveiy cf>re sint^e Jiu;st fjrocessor 
vendors d(} not support this methodology. 

Simulation of the netlist had some hmdles, Oru^ is tlie hiabii- 
it.y of the nethst to reset properly. TliLs prol>lem has its roots 
in the way the reset logic was done in dte HDL. Instead of 
using an explicit reser inference cm all tlip-ilops, the reset 
logic became pail of the input logic*. Depending on where 
the reset was stiTictnred in the logic, it might or might not 
caiise a particular flip-flop to reset correctly. In fact, this 
probk*m is more general. Every fime a reset-Uke signal 
is uycd, unknown slates (Xs) are not guai'anteed to be 
suppressed. Unknown states are periodically JntiT)diicc'<l 
Into tile dt^sign by captured vectors that use uninitialized 
ineniojry for operali^ins. For example, an nninitializecl stack 
inemcH-y may be used to fill a cache luie mitl puslied back to 
niemury, Granletf this is a mere simulation issue. However 
it makes verification harder because only alter these issues 
ai'e fixed c*an otiier real i>rol)leUKs be visible, Ivecause [>rob- 
lems that would have lieen masked can theti be caught. 

Motorola will restructure their HDL to avoid this problem in 
the future. liowc\er, for the Coldfire test chip, several steps 
have Jjecn taken tn remedy tile luobleiu. KvtM'>- fiip-fiop and 
latch ui file design is reset using a ton ■e-arul-re lea.se pair 
upon staitLip. When unknown states are intro<iuce(i into tlie 
system, the pattern is mterce])te(i and given a randcnn value 
insteatl. Since unknown states are essentially conditions in 
which the state of one or more bits is miknowiT, randomizing 
these patterns elTec lively giV'CS an laiknown pattern without 
tbe simulation nightmare. 

Test Strategy. To make the test work with STF^2 as ment ioned 
earlier, the core is fiill-scaii except the register' file, the i n- 
st met ion buffer, and the latches in front of the cache RAM. 
The latches can also lie tested using llie lateyl inellKKis tixnii 
ATG and thus offer \irtualiy no degradation of the test cov- 
erage, Tiie cache RAM has BIST circuitiy testing all eight 
RAMs ill i>arallel- Tlie BIST mode is encocied in the test 
mode pins available on the C/uldfire 5202. A limited ruunber 
of functional vectors run in \ eiiftcMtioii are also puJied lo 
the testers. 

Technology Independence. The entire core is technology inde- 
pendent. The only technology dependent portion of the 
Coldfne test chip is the pads. Since the jjrototypes aie tar- 
geteci to be used in Motorola's oV evaluation hoaixls. the ■^\' 
and 5V pads m the HP CM0S14 libraiy me used. Sinte onl>' 
I/O pads exist J input and output only pads aie made by t^uig 
off the appropriate enable signals. Tlve pads m'e instantiated 



only Tor the test chip. Synthesizable IIDL vtMsious of the 
pads do exist and can be synthesized to bufl'ers when the 
megaeell becomes available. 

Results 

The Coldfire test: chip is the first trial of the proposed meth- 
odology. The perl'omiance target of 50 MHz has been met 
wit It no custom ceils rjr modules. Both small die size nnd 
higli lesi coverage^ were at hieved by this t hifj t!i>^hcr rf>w 
mlliziili^in is only limited by extreme congest it mi spots like 
the biJU^rel shiiten register file, and jiipelme control block. An 
even smaller version is possil lie with a few modifications In 
key areas. In addititni, future vei'sions are not expected lo 
have the overhead of the invalidate registei^ imiik-^rnented 
as ni|>-fioi>s. The gates may also be lesized to meet ttie actual 
tai'get frequency. 

The desued changes have been successfully implemented. 
Motorola's custom cache was turned uito syntliesizatile con- 
trol and generated W^ST SRML The JTA(.i has l>een removed 
with minimum changes to the original 11 DL. BIST and lest 
circuit ty luive been added. All of these changes have lieen 
verified at tlie funetiotial and net list levels. Being at>le tf> 
make changes at these levels and maintain high confidence 
m tiie design is an invaluable advantage with this approach 
that would not have been possible witli artwork porting. 

Data luanagenien! thai is neede^d to maintain the coherency 
ol the design is m\ iniijortaju aspect oflJu' iiruject that cannot 
be overlooked. Problems in this area occuried faiily early in 
the pKjjeci. Scji|)fs were wjitten to make use of lineup files, 
tliiit is, hsts i>t designs vvitli si>ecific revision numl)ers that 
go together for a particular simulation or synthesis run. 
Changes that are not yet released are made in private direc- 
tories that can ]>e jiart of a i^rivate lineup file. The massive 
verification effuil iet|uiiesjubs to be run at eveiy available 
timi>, using eviu^ avaiiiilik^ o[}vu Verilr)g license. Scripts have 
aLso been written to use f IP Task Broker to get maximum 
efficiency of the available resources, 

Oonelusion 

Foiling processor cores using the new ICBD methodology^ of 
stiuidard cell synthesis has been shown to be a viable alter- 
native to the tratiil tonal atlwurk ik>i1, HDL t>orting has the 
ailvantages of lesiabilily* tecbnc)log>' independence^ custom^ 
izabihtyj erficient area use, system simulation capability, and 
presiiicon veri Ileal ion. It is also a straightforward methodol- 
ogy to suppoil since virtually all components of it are already 
ill use in the HP Standard Tool Flow 2. 

The ap]3roach luis its cJisadvantages. It c^uniot be applied 
indiscriniuiately on any processor etne. Many cores de- 
signed today still do not have synthesizable HDL. The syn- 
thesizaljilily of the core may also nm the gamut from being 
veiy easy to extremely difilcult, det>encting on a host of issues 
such as clocking strategy; coding style, and arehittK't ure 
complexity. The need for customization i>uts even ttigher 
expeclattons on tlie qutUity^ of the IIDL. Trying to change the 
fimctiotiality of a design written with raw Boolean equations 
and fii[j-fiop itLstantiations is almost as daunting as editing a 
ntMlist. 'riietetdre, the select loi^ of a iiueiTjjjrocessor vendor 
!nay depend on the vendor's design methodology; For cores 
tliat do not have syi^tliesizabie HDL. artwork porting may 
still be the only o[)tion. 
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HDL iK;>rting \^ill l>econie increasmgly feiisible with better Acioiowledgitients 

synthesis eimjIs and denser aiitl faster teehiiologj'. The acl- The airthor would like to thiink th*' entire Ctiidfire team: 

vances in these two areas have now reached a threshold ai Mark Reed, Paul Chenani Tom Tliaitiien imt] esiKH^ially Jay 

which implementation of entire tnitToprocessor cores with McDoiigal for his work on synUiesis^ble core methodology 

standarci celb compiled using HDL s>Tilitesis is practicabie. and sjnthesis techni*|ues. Tluiiiks also go to Tim Bnn^Ti and 

As more (jroc-i^sors sire dt^ignei! asing HDL and synthesis, Neal Jaarsnia for looking into the superintegralion and test 

this method()log>' will become tnore general. As tlie sf>ee<l of aspects of the core. Tlie team appreciates die support from 

the Tecluiologj int-reases, the le\ el of |n tMre^fjr perfomiaru'e the Motorola C'oldfire teani, t*speciall.\' JelT Fi-eeman aiid liis 

achievable using this meihodolog>" alst» increases. Silicon timely responses ajid ad%1cv. 
compilation is siowiy het*oniing a reality. IC portijig in the 
futnre should reach a level similar to porting software today» 
as designs are targeted to difiereni technologies vvith a few 
chaiiges in tlie sjiithesis and constraint scripts. 
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General-Purpose 3V CMOS 
Operational Amplifier with a New 
Constant-Transconductance Input 
Stage 

Design trade-offs for a low-voltage two-stage amplifier in tfie HP CMOSM 
process are presented and some of the issues of low-voltage analog 
design are discussed The design of a new constant-transconductance 
input stage that has a rail-to-rail common-mode input range is described, 
along with the rail-to-rail class-AB output stage. The performance 
specifications and area of this amplifier are compared with a similar 
design in a previous process, CM0S34. 

by Derek L, Knee and Charles E, Moore 



Expt^nenf'o gaiiu'tl over Iht" last f(^w ye^ui^ within jiie (ipsigii 
rpiit(*rs nf tiie HP IntograTed Cin niit Business Division (ICBD) 
has shown that a gi*nt*ial-|mrpost* o])i'rationi:iJ muf jjifier is a 
fundamental biiikling bltuk for many nuxi'il-sigiial iritegrateil 
iiRuits. These general-puipose operational ainplitlers are 
typically used in support func^tions and not In the high- 
frequency differentia] signal paths. 

With the recent process retea^se of AMOSUTH, (he analog 
veraion of tlie HP t M( JS14TB It' process, tlie logical stejj was 
to design a general-purpose operational aniplitler for use wirii 
mixed aiialog/digital chips using AMOS 14TB. However, from 
an analog standpoint, the technology change from (.^MOS^'M, 
Uie nu>st recent pro(x*ss in which analog (4rcuits had heen 
impleinenled, lo t/M( )814 wim quite severe because of the 
I)()wersu]iiily icfiuctitHi from 5V io;j;iV. Because oflh*: Iowit 
supply voltage specincatioru new circuit design tet^hjiiiiues 
needed to be developed and the generaI-pun>ose openttiona] 
amplifier was chosen as one of the test vehit les to ac liieve 
this goal. The amjilitler was also integral ed onto iui AIV10S14 
test chip. 

Design Objectives 

Because of the usefiihiess of the pre\ious CMOS-ll generaJ- 
purpose operalional anipiifier, die electrical specifications 
for tJie AMOS14 version were tlerived rrom tlie CMOS:34 
aniiilifier. The pov%er sufiply range was altered be<'ause of 
the teciuiologv' chmige. Other [jaraiuetei's such as input off- 
sv\ N'oltage. injjiu refened noise, and size were to btMuiui- 
mized, while oi^ei^-lcjup voltage gam, gLiin maigin, phase 
margin, and power supply rejection raticj w^ere to t>e mjixi- 
mized. A list of the design objectives is shown m Table I. 

Configuration 

Based on the design objectives shown in Table 1 muI tlie 
experience of the aut hoi^ iti tJie design of previtnis geneiai- 
purpose operational amphfiei-s, a twx>-stage configuration 



Design Objectives for 

Parameter 

Singlesnpply operation 

1 V 1 1 1 p e ra [ 1 1 ri ' r *U ige T( >i > 

Outputs 

Low quiescent power 

consumption 
Siual 1-si gnat baiul w i r I th 

f^^ (unity gain) 
Small-signal haHdwjdih 
Slew rate SR 
Outpiii voltage range 
Connuon-rnode iiipul 

range CM iR 
Load capacitance range 
Urad resistance range 



Table I 
AM0S14 Operatioirat Amplifier 

Target Value 

2.7V<AV[)i)<3.()V 
Ot'^X,j,^]10"C 
Single-entied 
lun^l nL.\ 

i Mnz<f,,sr, MHz 

Independetif ofCMfR 

lV/MS<SK^5\7us 

AVss+ IL2V s V,,iM ^ AVlh) - 0,2V 

AVss^C^MlRsAVyu 

R]oaij>300Q 



with a ( lass-AB outiuit stage was chrisen. Tins cimfigmation 
is capal)le of satisfying tJie power and load retiuircnientH. An 
added constraint for the AJVI0SI4 vemion (based on Imuta- 
tious of the previous versions) is Ihe s]>ec ifit ation for con- 
stant smalLsigTuil baiulwidth, independeni of the ct>nu non- 
mode ijipul range, C'MiR, The amplifier luis a fliffereiUial 
input, and the ct>mnion-mode input voltage is the a\orage 
value of the two input voltages. t'MIR is the range o%er 
wiiich die conunotvniode inijui \'oliage is expected to va^y, 
A small-signal Ixnuiwiddi lliat is ijuiependent of C'MIR im- 
plies that the input differeiUial stage luis a const atU small- 
signal input transcouductance, g,^. over the ftill C'MIR. even 
if the t'MlR is as laige as the difference between tJie power 
supply rails. 

Un'eraging the new AMOS 14 circuit design from the existing 
CMOS-'H design was diftlcult because the power su^iply voh- 
age range is reduced wlnle the PMtJS and NMOS transistor 
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thresholds, V^p and V^ respectively* are essentially un- 
changed. The power supply range for AMOS 14 is reduced by 
33% from thai of CMOS.34. This power supply reduction is 
feiriy signiRcaivt for analog designs in which devices are 
connected in series. Tlie outcome was that new low-voltage 
design techniques had to be employed to implement the 
equivaient operational amplifier in AM0S14 technologj*. 

Constant-Ttanseondoctance Differential Input Stage 

To obtain a differential input stage that operates over a rail- 
to-rail input voltage range requires an iVMOS and PMOS pair 
dri^'en in paraileL Because of complemeniaiy biasing require- 
ments, special circuit design precautions need to be taken to 
ensure that the overall gm^ or the sum of U\e individual tran- 
sistor gj^s, remains constant over the CMIH. Without this 
added circuitry, the frequency compensadon could not be 
optimized over the CMIR, 

The requirements for the constant-gni input stage are: 

• A simple circuit with a minimum mmiber of components 

• Low-voltage operation 

• Input devices operating in the square-law region where g^ 
is highest. 

• Constani-gm control circuitry operating in a closed-loop 
mode with the input differential devices to exhibit sniootit 
transition regions over the CMIR 

■ Constant-gfi, control circuitry that does not use reference 
voltage trip levels to control the differential bias currents, 
thus avoiding coupling supply noise into the input stage. 

An extensive search of t!ie literature^'^^ could not locate a 
circuit that met this list of requirements. Therefore, a new^ 
constant-gj^ input stage was needed. 

if Itji and Itp are the tail currents of the NMOS and PMOS 
differential pairs rest)ectively tlien the following relation- 
ship is retjuired for any common-mode input vcjitage: 



^ 2EnIta -K ^'2Ep!,p ^ gm = Constant. 
where 

k' - II r ^^ 



m 



(2a) 



and 



"P ~ f^p^rn 



w. 



2Lp- 



In equations 2a and 2bj \i is the carrier mobility tjnder the 
channeL C^x is the transistor gate capacitance per unit area, 
W is the transistor gate width, and L is the transistor gate 
length. 

If the PMOS and NMOS transistors are sized so that K^j = Kp 
then equation 1 can be rewritten as: 



/ita -^ /im ~ Constant. 



m 



A new feedback control loop circuit was designed that con- 
trols the bias cmrents in the NMOS and PMOS differential 
pair transistors so that equation 3 holds for all coninion' 
mode input voltages. This new circuit is showii in Fig. 1. It 
uses what the authors refer to as the 41/1 principle. 

In Fig. 1, transistors NOA. NOB, NIA, and NIB fonn the 
NMOS input section. Devices POA, POB, PI A, and PIB fonn 
the PMOS input section. These two sections together form 
the input stage to ati operational amplifier. The output cur- 
rents from these seetion.s™Iopp, IopNj IqnPi ^^^ ^ONN — ^^ 
summed in tlte first gain stage, described below. It is the 
overall gi^ of tliese NMOS and PMOS input devices that is 
held constant over the CMIR. 
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Fig, L C^onstant-transconductai'ice 
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The current ininTir N2C biases tJie NMOS input pair aini f he 
current mln'or P2C biases the PMOS input pair. The NMOS 
CMIK monitor devices, NIA and NIB, are biased by N2A and 
N2B at a current of :JI. The PMOS (^MTR monitor devices, PI A 
and PI B, are also biased by P2A and F2B at a current of 3L 

For midsupply conMnon-mode input range , both the NMOS 
input section and the PMOS input section are t>iased on. 
The PMOS CMIR input monitor devices, PIA and PIB, 
source a cmrent of 31 to tlie node CMN. Tliis 31 source cm- 
rent is added algebraically to the 41 current sink of N2C\ 
resulting in I he NMOS differential pair, NOA and NOB, being 
biased at a current I. Shnilarlyjhe NMOS CMIR input moni- 
tor devices, NIA and NIB^ sink a curreni of 31 from tlie node 
CMP. This M cuiTen! Ls added algei)raica]ly tti the 41 source 
current of P2C, resulting in tJie PMOS differential pair, POA 
and POB, being biased at a current 1. Therefore the NMOS 
and PMOS input sections are both biased at 1 for the tnit^ 
supply common-mode input. 

For common-mode inputs near AVl>[), the NMOS input sec- 
tion is biased correctly, but the PMOS input section is off. 
The current source devices P2B and P2C are also off and 
the PMOS CMIR monitor devices, PIA and PIB, supply no 
current. Since no current is added to the current sotirce 
N2C\ the NMOS tliffercntial pair. NOA and NOB, is now 
biased at a cuirent of 4L A siniiJai- argmnent holds for tiie 
PMOS devices when Ihe conmion-mode input is close to 
AVsy, and the PMOS transistors are biased ai 41. 

The differential input sections will be biased in one of the 
following modes: 

L The NMOS devices biased at 41 and the PMOS section 
with no bias cmrent foi' low CMIR: 



3. Both sections biased at I when the CMIR is such that both 
N2B and P2B are biased correctly: 



,41 + ,01 = Coi^st^ant 



(4a) 



2. The PMOS devices biased at 41 and the NMOS section 
wiUi no bias current for liigli CMIR: 



,11 -i- ,11 - Constant. 



(4c) 



v'Ol + ,;'41 = Constaiit. 



(m 



The closed4oop CMIR monitor circuitry smoothly controls 
the transition Ijetween these three modes of operation. This 
is demonstrated in Fig, 2. The x-avis of Fig. 2 represents 
the CMIR fron\ AV^s to A\'ud (rail to rail). The upper curve 
shows the overall g^ or the sum of the NMOS and PMOS 
input stage gj^^s, whOe the lower curv^es show tlie indj\idual 
gjdS of tlie inimt sections as a fimction of CMIR. The overall 
gji^ has a total variadon of only 5%. This nmiiber includes the 
second-order effects of subthreshold operation and output 
conductance. 

Fig. 3 show^ the sintulated variation of the intrinsic input 
oITset voltage, V,,^, as a fiiuction of tiie t^MH^. This cur\x^ 
shows one of the limiUitions of a complex injju! differential 
pair input stmctmc: the input offset voltage varies as each 
of the input differential pairs is activated or deactivated. 
During the traiLsitions between modes, the common-mode 
rejection ratio, CMRR, is reducedJ^- Therefore, the design 
of Fig. 1 attempts to minunlze the widtli of these transition 
regions with respect to CMIR, 

First Gain Stage 

The first gaui stage sums the four output currents from the 
input differential stage; lopp, lup^, luNls '^^d iy^j.'^j. The 
criteria for selecting the best gain stage were; 

• Tlie stage should use wide-swing cascode c un'ent sources, 

• It shouki interfat e easily with the following rIass-AB output 
stage or second gain stage. 

• It should not a<ltl any additional noise or offset to Uie mput 
stage, 

Tlie gain and current surnnmig stage selected is shown in 
Fig. 4.*"^ This stage reduces the transistor count consider- 
ably because of its compati integration v^ith die cLass-AB 
output stage (see next section). 
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Common-Mode Input Rartge jV) 

Fig. 2. The x-axis represents the comnion-mode input range (CMIR ] 
of the circuit of Fig. 1 from AV^^^ to A\Vj[j (j-ail to i^ail). Tlie upper 
curve shows the overall g^^. The lower curves show the individual 
gmS of the input sections as a function of the CMIR. 



It^ 






-iw 



-200 



Common-Mode Inpul Range (Vf 



Fig, 3, Simulated vanation of the intrinsic inpui offset voltage, V^jg, 
as ti function of the CMIR. 
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FiK* 4. Sr[iM]!i;)Tir^ Ll[M,,i^nini iff ilu'^ fmsE ^i\u\ slai^r, wfiuii sums thr 
[our mil 1*1 IT currenfs fniiu iJic inpiH sfii^f. 



S era lid Gain Stage 

Tlie cTileria iLsed hi tin* select iini of Nit^ dass-AB output 
stage iniplenieiitatkni were: 

• Siini)le mu\ liigh-speed (lesij^n 

• No ('t>[ni)lex atiive or inii|jlifier feedback paths in the AB 
control circuit i> 

• Low-Vijfj f>peration 

• Goott power supiJly rejection raiio 

• No direr! de(>endeTice on suppty voltage ten hjjLs curriM]! 
setiip 

• No noise or offset, to tie a(!ded to tlu^ Orst. stage of the 
ainptifien 

The (:»ut])ut slaj^e chosen is shown iii Fig. 5. The circuit 
stiown in Hg;. 5a is a simplified %'ersion of the oiit]nit stiage. 
Tile sctieniatk' iji Fig, Tili shows lire unpienu^ritatioii (jf the 
AH oulfuti stage irUegratcHl logether with tlie first gain slage, 
Thiw outjKit stage wtus ilrst developed for 5V oporatiou^ ' and 
laier niodirie<t for an all-digital process. ^^^ 

The ou1i>ut stage uses eommon-si>urce output devices for 
h^w-voltagp ojieralUui. Tlip thef>retical nuruuuirn supply voit- 
'^e is Iwite tlit^ M( )S thrt'sliold voltage pluH a saturation 



voltage. Tlie coniplen»eiUiu>- output devices PDR and KDR 
arr driven by coitipleniefiliUTr' t^onunon-gale level sliifters, 
PAB aufi N.4B. The first-siage input signals are fetl into ihe 
output *itage at nodes FDRV and NDRV. During quiesc^eru 
operation, PAB and NAB are biased m ihe conducrmg state. 
The fKiientials at PDRV and .N'DRV are established to mini- 
mize the quiescent current tJu^ougfi the large out put driver 
de\1res, PE>R and NTjR. Tliis biasing arrangenieni is esta^:^- 
tlshed through two tianslmear loops. The loop thai biases 
PDR consistji of P5.=\, PaB, PAB, and PDR. Similarly, NDR is 
bias4xl by tlie loop consisting of N5A, H5B, NAB. and NDR. 
For a short tutorial on translinear Iheory sec reference 17, 

During a negative slew at the output, the gate voltage of 
NlJF^ Is ] allied high. Siiu'e the buLs voltage ABN is fixed, the 
device NAB will shut off. The de\1ce PAB will be then con- 
ducting tiie full bias ciurent, lp{ . which will result in m^ in- 
cniease in die gate-to-source v^oitage of PAB and consequently 
a reductitm in the gate-io-source voltagt^ of PDR. A similar 
operation occiu-s tluring positive sourcing when the bias 
voltage for NDR is reduced, b"^ 

Tlie integration of the cltuss-AB stage ;uui the fii-st gain stage 
has two nu\jor adviuUages. Tlie first advajitage is liie tloatlng 
cairrent source^ l^\\ which is set up through two additional 
tmnslinear loops: N5A N5B, NFC\ N;^Aand P5A, P5B. PFi\ 
P:iA. BecaiLse of the floating nature of the bias de\ices NFC 
and PFC and NAB and PAB. itiis tnirrent sou ret* contiibmes 
much less to tlie noise and offset of die amphfier. Sc^condly, 
the vai'iation of the output iiuiescent current Ls r(^duced 
l)ecause tlie floating ciuient source of PFC and NFC tracks 
tlie AB cuiTt^nt source of N^\B and PAB. 

Final Circuit and Results 

The complete scheinalic for the jLM()Si4 operational anipli- 
Tier is shown in Fig. ih TJiis f igmx* sliows in detiiil the cas- 
code (*urrent source im|)tenientati<in. 

Fig* 7 shows the open-loop small-signal frequency resjjonst^ 
and phase duiraet eristics of tlie amplifier driving four differ- 
ent load coniljirJutionH Th<'se are 1{) MQ!|l pF U) MQ||1(K) [)F, 
:M)Qiil pK and \mil\m j>F Kig. ^ shows the small-signal 
freciuenty response and jjluise cluuaiteristicsof the iuiij? litter 
for different CMIR vahies rmiging from AV^s to A\'u[). Note 
that the inuty-gain frequency f,j is essentially hidepentlent of 
CMIR, 

The sniall-sigtval slej) respon?>e is shown in Fig. 9 for the 
same load <'r)mbinationsas Fig. 7. Tlu^ large-signal step 
response, indicative of the amiilifier's slew rate, is shown 
in Hg, [XI Tlie ari work layout for the oixnational amphfier 
is shown In Fig. 1 1 . 

Table II illustrates the overall siiuiiarities of the AAIOSM 
opera! ional iun|diner to tlie (*Mf)S;34 version. In siinmiary, 
1 1 le AM ( JS 1 4 i 1 es i g n ;ic 1 1 i e\ et i a 2 x i i u 1 1 rt ) \'e m e n I in I liuid- 
widtli, a 2.5 x increase in class-Afi <jut|jut driv(% and a ^] x 
improvement in slew rate in a tiiird o( liie area while at the 
skm\Q time including the additional const ant-g^ circuitiy. 
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Parameter 



Table 11 

Amplifrer Process Comparisan 

AM0S14 



CM0S34 



Supply village 


2 J U> ;i.6V 


4.5 ic> 5.5V 


Supply f'un enl 


625 \aA 


750 [LiA 


Com ui cm-mode inpiil range 


AVss to 


AVss lo 


(Mm 


AVoj) 


AVpn 


Constant-gn; uiput stage 


Yes 


No 


Iiipul stage g,„ variation, 


< ± m 


btm 


AV'ss^rMtR^Vni, 






[iiiriMsIf^ input oOset voltage. 


-HOiiV 


- 120 i^Y 


VmR = AVBly/2 






ResLstive load 


300Q min 


mmQ niiii 


( -apacitive load 


lOOpFmax 


lOOpFmox 


Maxiniuni oiitinit drive cuneiil 


± ■> inA 


± 2 nL\ 


hn(\K 






Maximum ouipai swing al I^jix 


AVpo-0.25 


AVnn-0.:3 


Minim 11111 oijri)U( swing at Imax 


AVss + 0.25 


AVss + 0.3 


Opendoop gain (no load) 


> 100 dB 


> 100 dB 


Slew rale 


6V/Lis 


0.5V/^s'^ 


Unity-gain bandwidth^ fo 


4 MHz 


0.5MHz^'^ 


Pivase aiaigin ''^^ 


55 degrees 


49 degrees 


Gain margin '"'= 


- lOdB 


-8dB 


PSRH + , AVss ^ ^^MIR ^ AVpo 


> 70dB 


> 80 dB 


PSRR - , AV.ss ^ C'MIR < AVod 


> 70dh 


> 80 dB 


Cell size 


251 \im X 


460 fmi X 




141 (.mi 


210 ^m 



• Oepantts on CMIR 

"' Absolute worst- case corsdiiiQos foi 1^^,35. AVdb. R, C. rrodets. Sm Rg 7 

CMIR- c&mman-mQde input rangg. 

PSRfl - pov^er supply reject inn ratft^, 

Ac k nil w 1 ed g in e n is 

The authors diank their project manager, H^ycev Badyal, for 
his eneoiiraj^ement during the development of iliis inn]»litu=r. 
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Improving Heat Transfer from a 
Flip-Chip Package 

The lid of an ASIC package can significantly increase the temperature of 
the die by impeding heat transfer. In ftip-chip packages the backside of a 
die can be exposed by eliminating the lid, thus allowing a heat sink to be 
attached directly. Numerical finite difference methods and experimentation 
were used to investigate the differences between lidded and lidless 
flip chip designs, The results demonstrate that a lidless package is a 
superior design because of the increased thermal conductivity between 
the die and the heat sink. 

by Cuilen £. Bash and Richard L> Blanco 



The rooling of clet'tronic coinprjneiits haslraciitifnially beeu 
catmch^Ted as two f!^ep«iratt* j>rol)lf nis: (jpUnii>:ing ilie imenial 
Theniial i>ath wiiliin tlie t>ackage, and rofiling the packaged 
tom|Hj]ient by opiinii/ing iho eKtenial rhennal parli. While 
this meihfjci lias ihe ad\';uiTage of being panjliotialile jmd 
therefon* solvabU' indi^pi^iuiHiTly hy separati^ organizaiitais 
i»r ronipriiiies, it fails lo engineer fhe Ihcrniaily optimum 
solution. This i^ especially critical for high-power dice, 
wiiit^i typically rtHjiiire custom heal sinks. 

'Hie elect rrmlcs industry is moving in the dircdityn t?f lidh'ss 
nip-rlnp packages, which create new jKissibilities for<'ooliug 
the dit t\ Prox*cs>inr chips \vmn of her major elect ionics sup- 
pliers are cnnently available in lidless packages hecanse f»f 
tlu^h" du^j'UKil atifl cost advantages J 

As an experinuHvt to ini))r(jve tlu* tlesign f>f a liigli-f>f)wer 
processor package, the IIP PA 8000 processor, a prrjposed 
fiesigu of a lidless package was compared to the traditional 
lidtlect package currently in nsf\ An example of a Udless 
t>ackage using an air I'ooled heat sink has been discussed in 
an earlier paper." In the present invesrigation, the proposed 
design uses the evaporator of a heat pipe assembly to con- 
tart the die. ifuis replacing the lid. This ( nnc e[jt has the 
additional benefit of reducing the c<isl of the package by 
eliminating the relatively expensive Ikl. 

The investigation began by coastnieting tl nit im tiff erenee 
models of the lidded and lidless packages- The purpose of 
th^ models was not to con elate with measured results but 
to aid in understatuHug the magnifutle oniie relative hn- 
provemeuts of the iidless design. After revie\%^ing the results, 
lahuratoiy nieasttrenuMits were made of the twtj designs atal 
the relative iniprovenients in thermal perftjnnantT were 
recorfUHl. 

'fV^'f^ diffc^rem uuthods were ch(jsen to cool the tiackages. 
The heal |>i(ie employed in the cunent HP PA 8000 design 
was a natintil choice hecuuse (jf its practicality. Addilionully, 
because of concerns about thermal gjadiefUs in the alumi- 
num heat pipe evaporator and the difficuHy of nuUching 
Uiese to the luamdarv conditions in tlie finite dilTererue 



model, a ver\' efficient !>ut imjjractical liquid (*ooled heat 
sink was chosen. The liquid cooled hear sink is highly efTi- 
cient and behaves like an isothem^al block, which is easily 
modeled. 

For consistency throughout this pai>er* tin* lenn (thfttihium 
mmporaior hmi f^luk refers to the aluinitnmi evaporator 
on the \\vA\ \\\\\v assembly that directly sinks heat from the 
pack;ig(\ Likev\ise. rlie term rapprr block hraf sfttk refers xo 
the chopper blot k on th*' liquid cooled heal sink that acts in 
the same cajmcity. 

Package ronslructiiin 

The litlrled atnl lidU'ss [ia< ktige designs mv shou ii in F1g. 1 lor 
the aUitnhnmi evaporator heal sink. Both packages are con- 
stnu'ted ideiUieally he! ween the printed circtiil hoard and 
the ilie, MmuUed on an I'li-l jHinied circuit lioard is a plastic 
socket c<uUainiug lOHSl coiUacts nia<le from 0.02r)-mni gold 
plated molybdentim wire fscM* Fig. 2). A ccramit^ land grid 
aiTay t>ackagt* rests on the socket, nuiking electrical contact 
Ijetwpcu the die and the board. Tfie tinjcessor flie is attached 
usitig nip-iMup tecluiology,"^ resuifiug in about 2500 sokier 
bump connet tiorrs encaijsulated hy an ttndertlU material 
between the ceranne substrate and the silicon dh\ Fig. -i 
shows the* lidless package, plastic soc^ket. and t>rinled circuit 
boarfJ assembly. The ahiminuiu terrier shown iti the picture 
is used to snppotl llie assenihly. The Ik^uI sink has been leff 
off so thai the assembly can he seen more <leai1y; 

The lidded design uses silver-filletl epoxy beiween the <iie 
a!\d tlie lid to enhance t hernial perfonnance. The lid is H^bri- 
<'alefl h'oin a Kovai^ ring l)i"azed to a sheet of tiuigstt^n copi.>er. 
Fig. 4 shows rht* lidk^ss anil lidrl(^<i trackages side f>y side for 
cr.unparistai. Amojvdeiaik^d de.scriptiiai tif the lidded puck- 
age can be found elsew^here in the literature.^ 

The hdless design uses Dow Corning .'i40 thermal grease as 
the thermal rnterfat^e above the die. Tins is a (Tmsenative 
choic^e (xmsidering thul there are thermal greases available 
that have ihmnal r finductivilies more than three times thai 
u{ Dow ('(miing'KO/* 
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Meafiurement Techiiiqiie 

To compare the iliennal lierformana^ of the ! wo packages, 
a thrmial r.esl die with a (einperaUin^-sen^ilive ix^sislor was 
plac ed into each jiat'kage To allow rlireci measurt'inejit of 
the die temperature. The packages were each tested on the 
same socketed printed f ■iicuit board comiected to an HP 
75000 data at quiyltion system and a pow^er supply. Ai\ HP 
9000 Series ')00 workstation with data acquisition software. 
HP VEB, disi>[ayetl the die temperature as a function of time 
while the power supi>ly [>rovided the jjow^er to the die. 

The two thermal test dice w^ere calibrated in a Delta Design 
9000 Sillies convective oven. Resistances were capt ured with 
tJie dala acquisition system at four different temperatures 
ranging from 18 to 90 degrees Celsius. A least -squares fit 
was obtaiiietl for eacii package and the results were [ilaced 
inioHPVKE, 

Pour experiments were imdertakeji coiiiparing each pack- 
age — lidded ^md lidless — rooted hy each of t l\e heat 
sinks — the aluminiun evajjorator ajul (lie ctjijpei' block. 

Copper Block. The copper block treat sink w^as used to pro- 
%1de an isotheru^al surface to (he package to which it was 
attached. This was accomplished via an elTicienl liquid 
cooled heat sink mouuted to the backside of (lie tiighly 
conductive copper block as depicted in Fig. 5. The Iiqiii<l 
cooled heat sink consists of a paitially holkiwed akuniiiuju 




block through which water is cyc-led. The water is cooled by 
ambien( air via a hea( exchanger. Measurements showed 
iha( (he surface of the copper I)lock was kept isfjthernml to 
wiEhin S^'C, which indicateti tiiat (he liquitl cooled heal sink 
was funcUoning as intended. 

Each package was tested with the copper block heal sink by 
compressing it between (lie upper and lower s( iffeners with 
a (-'Clanip. The setup was siniilar to Fig. i\ hu\ wi(h (he heat 
pi|ie replaced by the liquid cooletl heat sink. A load cell w^as 
employed to measure the compressive force being generated 
by the claiuping assembly and stiffener plates w^ere used to 
distribute die C-clamp load. Each assembly was compressed 
(o 150 pounds to ensure coiuparable contact resistance be- 
( ween the two packages. Tliree Ihennocouples were placed 
whhiu the copper block to record the heat sink temperature. 

Aluminum Evapt^rator, The aluminum evaporator is cooled by 
a heat |a[je assembly. The assembly is coristriicted of tiiree 
sintered co]q)er [)ipes with water as tlie working fluid 
mounted planar to the evapiirafor, and thin aUmumim fms 
are attached to the opposite end nf the pipes. Heat from the 
alimiiuLmi evaporator entei^ the pipes, causing the w^ater 
to vaporize. The steam is condensed at the other end of tiie 
pipes by air flowing over the fins. The water then ret mil s to 
the evaporatoi' via capillaiy action, thus conq>ieting the ther- 
(uodynamic cycle. Upon measurement, it was discovered 
that the aluminum evaporator w'as indeed Isot hernial like 
the copper lilock. altliougb at a higbei^ lemperahire. 

The aluminimi evaponitor was used to test the thermal (per- 
formance of the packages in a manuer similar to the copper 




Fig, 2, Plasiic socket with UM) gold plated molybdenum vnw 
contacts. 



Fig, S, T\w cxpeniiiciTlci] asiienilily wiiEiinn the* hccil yirik. siuAving 
the ji tint ft! tirruit honnl. socket, ruvtl (id less j>at ;is3^e. 
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Fig. 4- Li^t)' S.N Mid hddpd F>acicag(?s used m lIxp pxpprliiienl- 
TTie Irririf^l (tarkage is on ihe ri^t. 

Iiluck. A (lamping assembly compambli* to that used for 

ilw c'opptn- block wa^ employed (ihe ebmping assembly is 
shcraTi in Fig. 6 vvitli tjie heat pipe). The entire assc»mbly was 
IJlaeed in a wind tunnel with a nomimil \'eIocity of IM meters 
per second. A single ihermor^mple was placed netir the 
e%'aporalr)r plaie/pai^kage inieifaee lo recorci temperature. 

Data ComparLsoti Melhodoiogy 

Thermal resistance will be used throughout this paper as a 
jiieajis ijft cHUparing Ihe data obtained from modeling and 
measurement. It is dt^Oned by eqtuition 1 and frequently 
calculated using empirical data with equation 2: 



R - L/fkA) 
R = AT/Q 



(n 

(2) 



where L is ihe lliickness of the material, k is the material 
tltennai t nnductivity. A is the cross-sectional area, AT is 
the measuieti tenq>eratur(^ difference, and Q is the heal 
flow. By definition, thermal resistance is applicaiile for one- 
dimensional, steady-state heat iransrer with no inlenuil 
energy generation. In eleelronies pac^kaging one rarely en- 
counters one-dimen.si(jnid heal rrmisfer iuul there is signifi- 
cant intenial energy generation in the silicon die. Additiooiilly, 
it is rarely ever knowii explicitly how umch heat is flowing 
info the heal sink relative to Ihai hetn^ abscjrficfl by the 
hoard. Typically, if no additional infonriatirin is known it is 
assvuned Ihat all of the heat is dissipated hUo Ihc heal sink. 
NeverHielt^ss, with the restrictions on eiiuauons 1 and iarul 
the imknowns involved, themial resistan<*e remains a useful 
quantity for the comparisoji of siniilai" packages oti similar 
printed circuit boards and will be used in that eat>aciiy in 
the interpretation of results. 



Modeling Technique 

A s*>ft\vare locil employing a finite difference nie!h(xi was 
used lo create models to represeni ihe cooling of the pack- 
ag*\s under test.*' One model was created for the lidded de- 
sip i iuid a second was created for the lidiess design. Ulih 
each model either the copper blcx k or the alimtiniun evapo- 
rator could be activated as the heat sink. 

Two simplificarions were made in modeling ihe packages. 
Components of the model that w^ere thin layers, such as the 
eposy and grease laj eis. were modeled as internal plales 
with only ontMUmensional conduttion, normal to the surface 
of the layer ScH-ondly, to simplify the model mid reduce 
large grid as|3ec1 ratios and thus ronvergenc** time, geometry 
(hai was nearly coinc^ident and thermally insignifK^anl was 
spatially aligned. For example, the plastic soc ket housing 
Is 0,T nmt larger than the cerantic but was modeled as the 
same overall size. 

The FR^co[>l>er n^ultdayer prijited circuit boartl was model- 
ed iis a sohd FR-I block with a single layer of copper f)f 
thickness ecjuivalent to the comlnned thicknesses of the 
copper layers in the boaid. The conductMty of the multi- 
layer printed circuit board was calculated to be equivalent 
to the copfier iind FH-4 material m parallel, while the conduc- 
tivity of the single copper layer placed within the mrxieled 
printed ein iiit board was made equivalertt to ihe cojjperand 
Ffi-4 material in series. Only solid eojTper planes were in- 
eluded in Ihe model since discontinuous signal t>liuies have 
been determined to be inconsequential in conduct ing heat." 

T(j simplify the 1089 iitdi\idual metallic contacts of the soivket 
in the plastic housing, a block of equivalent condnetivity to 
Ihe lOHI) individual (L()S>nun-diame!er mulylHlenum wires 
was comt>ined in parallel wit It the conduclivity of the plastic 
housing. 

Similarly, lire solder bump layer with underfill was int>delcil 
as the area of 2500 solder l>umps in parallel W7th the area of 
the underfill compound, with the conductivity of the internal 
plate a]j prop r lately weighted by the product of the thermal 
conductivity and the area of each materiaf 

The copper Idock was modeled as an isotliernuil Milnme 
with a negative inienial ptmer source (i,e., a sink). Ttie 
evaporator assembly, while more difficult to approximate, 
was modeled as an ahiminiun Ijlock with negative intemal 
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Fig. 6. Experimental setup with In 'a I |j]|:)G. 

power sources that were oftJie same voJiinie and in the 
same locations as the heat pipes used in the ex|iet inient s. 
The at tnal cross sections of tJie heat pipes weiv intKieletl 
as squai'es because of the orthogonal limitations df the soft- 
ware tool. 

The niodcls were const en cted to ( alciilate conduction 
through tlic package \a study the effects of vtuious construc- 
tions and materials. To simplify and reduce convergence 
time, eoohng from natural conveciiuji was not considered. 
This method allows good roinparative results for small 
changes in materials hut does not yield results that could 
b e d i re vl \y co n u^are d wi th n ip as 1 1 n ' i u e n i s . N' evert h(^ 1 ess , t he 
purpose of I he modeling was nol to correlate uumerieal data 
with expel imental data, but i^atJier to determirte whether 
experimentation would be worth wlule. 

After the models were created, grid sensitivity calculations 
were done to ensure that the resull.s were not affected by 
nnrnerical computation errors induced by grid size or aspect 
rat ios. 

Modeling Results 

Copper Block. The modeling results for the copper block are 
pK^seiitcd in Table I. These results sitow lliat tlic thermal 
resistance between the die and the heal sink of the two 
package styles was identical, within modeling eiTor and for 
the ass uru|>l ions made in the model 



Table I 
ModeleiJ Thormal Resistance for Copper Block 

Package Type Thermal Resistance {'C/W| 

Lidded 0,21 

Lidless tD/2! 



Aluminum Evaporator. Tlie resulti^ for the alumimmi evaporator 
are shtuMi in Table 11. Again, die Ihenual resistance between 
the die and the heat sink was nearly identical belween the 
n-vo designs. The model shows a small benefit in the lidded 
design. 

Modeling Summary, Gi^en tire considerable liissimiptions and 
simplifi cat ions, It was difficult to draw a strortg conclusion 
based solely on the modeling results. Considering the small 
differences between the two designs^ it was veiy conujclhng 
to const njct the packages and measure them. 



Table II 

Modeled Thermal Resistance for Aluminum Evaporator 

Package Type Thermal Resistance | G/W) 

Lidded 0.21 

Lidless 0.2G 



Measurement Results 

Tettiperature measurements were taken for each of die four 
parkage and ht^at sink combinations. The results are pre- 
seutetl in tables III and IV. Included in eat:h table are the 
j>ower dissipation! beat sink temperature, die temperatme, 
and tlienuai resistance. The f hernia] resLsfajice column refers 
to the thernuil resistance between the die and the heat sink. 
It incIiKies the separate resistances of the die, epoxy and lid 
(if applicable), thennal grease, and a portion of the heat sink 
through which the therniocouples were embedded. 

Copper Block, Table III displays tbcnnal data from each 
package using the copper block heat sink mul Dow Commg 
84 1 1 tlvential grease at the heat sinky^package interface. 
Note that the thennal resistance decreased by 50% with 
the removal of the licL 



Table III 

Thermal Performance of Packages witb Copper Block 

Pack- Power Heat Sink Die Thermal 

age Dissipa- Temperature Temperature Resistance 

Type tion(W) TO ( C| ( C/W) 

Lidded 93.3 40.2 55.1 0.16 

Lidless mj^ 40.6 17.(; O.OS 



Aluminum Evaporator. Data from tiie tw^o packages vv iUi I he 
ahnninuni evaporator actmg as the heat sink and Dow C om- 
iug '340 thermal grease at thc^ interface is presented in Talile 
rV. Note that both the packaged die temjjeratures aixd the 
heat sink temperatures increased using the aluminum evaj)- 
orator hecause it is no! njs erfieient tis tlie cojiper block. The 
thermal resistance decreasi^d slightly for each package tjpe 
over that obtained in Table 111. This is most likely because of 
differences in thermal grease application or thennocouple 
placement. Finally, the thermal resistance decreased 53% 
lipon removal of the hd. As expecietL the decrease in ther- 
mal resistance is uidependent of the type of heat sink used. 



Table IV 
Thermal Perfofmarrce of Packages with Aluminum Evaporator 

Pack- Power Heat Sink Die Thermal 

age Dissipa- Temperature Temperature Resistance 

Type tion(W) (XJ ( C) (C/W) 

Lidded 85.8 6:3.9 77.1 0.15 

Udless 85.3 66.2 72.2 0.07 



Measurement Summary. Tlie meiisured thermal resistance of 
the lidded package compares very favorably with mciisnre- 
meuts taken by other hivestigators. ^ 

Table V displays tl>e ajnounl of power that can be dissipated 
by each heat sink/package combination at equivalent die and 
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Table V 

AflQwebfe Power Dissipation for Equrvafent Die Temperatures 
of 1 to C and Ait Temperature of 50 C 
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air temperatures- The heat siiik themial resistance refers to 
thi^ r hernial re.sistiUire between the l^eat sink tliermocoiiples 
and thi' antbient an\ The restilis iiuii( are iliai ihe lidless 
package m a sjgnifirantly belter performer than it.s hdded 
<^'oiJtuer[)arr. Tlie lidlt*ss jjai'kage attacli*^<l to ttie copper 
block is able to dissti>ate M% more power, or 52 watts more 
I ban the liflded vei>iif*n. Likewise, for the alunnnnin evapom- 
tor. the htUess package is able lo dissiijate 15% more iiow-er 
or 15 watts more. Note that a larger relative imtH"ovement is 
rc;ilized liy iLsinga more efficient h(^At sink. Tliese c^iicnla- 
ti4)ns iLssiime no losses other tlian through the heal sinks fnit 
f'letuly show the superiority oflidless package designs over 
lidded. 

The superiority of the lidless package over the lidded, while 
expected, may not I>e as ob\1fniM to iJiiediei as it first appean^. 
One of the mm\ inguniettts for keef)ing the lift on the package 
i.s lliat it decreases tiit^ lu*at Ilnx by increasing the smfacr 
an 'a through whicii hcMi c'm\ leave the package to th(^ lieat 
sink. By a one-diineuyional analysis, h ciur be st^jwn ihai ibe 
tbemtaJ resistiuice of the lid is an order of magnitude less 
dian I hat of the ejjoxy. This indicates that usirig tbe lid as a 
hi'al spreader to d increase ttie lie at tlUK through iho j>ackage 
is not necessiidly a bad iflea. Rather, it is the Inrnding of the 
lid to liie die with a layer of eiJ<>xy that juakes jt a relatively 
poor thenual solution, [fa lid must be used for reasons other 
Ibiui Ibemial [)erfonuance. it is clear ibal ati effoH should be 
made to reduce as mm li as posf^ible the tlu'rmal resjsiance 
nf the lionding material by detTeasing its thickness attd/or 
increasing its therrnal con<iuctJvity. 



SuntAiao' and Conclusioits 

The results froni (he modeling showed ihat I he thermal 
perfonumicei=i of the packages were ver>' similar and the 
lidiess design warranted further investigation through lab 

mejisuremenis. 

( Vimparison of the thennal resistances of the two packiige 
styles was vei%- consistent for Iwith ihe < tipj>er bI(K*k and tlte 
aluminum evajximtor measun^nent methrHis, Both measure- 
menl meih(Kisshowe<i alxnit a '>0% improve men I in ihenmil 
resistance in the lidless design. 

Wliile impracti<'al for lo\v-<T>st computer sj^tems. ihe liquid 
cooled copper Islock measurements deteniiine some limits 
of roohng of the HP PA mm die. The lidded design c*nild 
dissipate 180 watts of power while tb(> lidless solution could 
dissipate 242 watts while maintaining the temperature of ihe 
die within the hmits for rehab le operation. 

Tile measured resuhs inihcate that the lidless t>ackage is 
titennally superior Hi tbt* liddt^d design. For ihe ^ilinninum 
evatiorator, 15 niore watts could be dissipateti while main- 
taining the saJiie die temperature. This is of particular signif- 
icance because a heat pipe assembly i.s one of the present 
cooling designs for the HP PA 8(K)0 pjoeessor 

To4)btain I be thinmal [jerformmice requirefl in next-genera- 
tion chips, the cooling <lesign will iu'(^<l to V>e .sobbed as a 
coupled probletn. coasidering the com|>lete themiid path 
originating from die suiface of the die <mii ending in the 
cooling air Tfie lidless t>ackage is one i)ossihle solutifm. 
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sor Rohit receivBd a BSEE degree in 1985 from the 
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LiniversJtv of Houston, an MSEE degree in 198B from 
tfie University of Minnesota, and an MS degree m 
engineering mansgerryent in 1 990 from Portland State 
University. Before joining HP. he worked at IBM doing 
Power PC CPU design aod at Sharp Microefectronics 
doing digital signai processor design Bom in New 
Delhi, India, Rohit ts married and has one daughter. 
He enjoys spending time with his family, traveling, 
biking, and swimming. 

30 PA BOOO E I ectri c a I Va r i fi cati q n 



John W. Boakhaus 

John Bockhaus received a 
BSEE degree in 1991 and an 
MSEEdegree in 1992, both 
Uom the University ol lllmDis 
at Urbana- Champaign. After 
graduating he became a 
member of the technical 
staff at HP's Systems Tech- 
nology Division His first 
project involved working on the postSJlicon functional 
and electrical verification for the HP PA 7100LC. He 
then was responsible for the presilicon design and 
verification of the debog circuitry for the PA 8OD0 and 
laier was a part of that processor's posts i1 icon eJec- 
tncaJ verification team. He is currently working on the 
presiltcon verification of the PA 8500 He is named as 
an inventor in hve pending patents related to on-chip 
debug and diagnosis circuitry He has coaulhored an 
article on software pipelming and is a member of the 
IEEE John was born in Peoria. Illinois and is married. 
His hobbies include basketball, landscaping, card 
games, end traveling. 

RohttBlialia 

Author's biography appears elsewhere in this section 

C. Michael Ramsey 

As a member of the technt- 
cal staff at HP's Systems 
Technology Diviston. Mike 
Ramsey recently worked on 
the electrical verification of 
the PA 8D0D processor, in- 
cluding the turn- on of the 
first silicon, test case gen- 
eration, and failure debug, 
and IS currently responsible for the presificon verifi- 
cation of the PA 8500 He has coauthored an article 
about the thermal modeling of the effect of nonuni- 
form power generation in high-powered CPUs. He 
earned a BSEE degree in 1373 and an MSEE degree 
in 19S1 . both from Tejias ASM University. He joined 
HP's Data Terminals Division in ^980. He has worked 
on processor board design and layout and ASIC design 





for numerous ternninals and on the design of the back- 
plane for the HP 9OD0 8x7 Series connputers. 8om in 
Palo Alto, California, Mtke is marrred and has a son 
His hobbies include gardening, iandscaptng. and 
home improvements He also studies Tae Kwon Do 

Joseph R, Butler 

An engineering project man- 
ager at HP's Systems Tech- 
nology Division, Joe Butler 
recently was responsible for 
hardware design tor the HP 
PA BQOQ processor and is 
currently the project man- 
ager for the PA BZOO proces- 
sor. He has atso worked as 
a test engineer on the PA 7100 and 71 50 CPUs Joe 
received a BSEE degree in 196B from the University 
of California at Davis and an MSEE degree with a 
specialization in VLSI design and compyter hardware 
from Stanford University in 1993, He joined HP's Inte- 
grated Circuit Business DiviSFon in 1988 as a product 
engineer 

David J. Ljung 

Bom in Madison, Wisconsin, 

Davsd Ljong re[:eived a BS 
degree in elect neat engi- 
i^ ^ HH '^^^'''^^3 ^^^^ computer sci- 
ence from the University of 
Wisconsin in 1994. After 
graduating he became a 
member of the technicai 
staff at HP's Systems Tech- 
nology Division. He worked on the electrical charac- 
terization of the HP PA BOOO processor and is cur- 
rently providirig too is support and verification for the 
FA 8500 processor. In his free time David enjoys 
cooking, spending time with animals, and swing 
dancing, He also enjoys outdoors sports such as 
snowboard ing, skating, and skiing.. 

3fi PA 8000 1 nte re on n ect Ro uti ng 



James C. Fong 

An R&D engineer at HP's 
Integrated Circuit Business 
Division, Jim Fong has been 
developing the placement 
and routing technology used 
in HP's PA-RISC CPU chips 
and is currently working on 
the over-ihe- block detailed 
router in the PA^Route sys- 
tem. He has authored half a dozen technical papers 
for HP Design Technology Conferences. He is profes- 
sionally rnterested in VLSI design. CAD. software 






engmeering, and IC routing He earned a BS degree 
in electncal engineering and computer science from 
the University of California at Berkeley in 1979. He 
then joined HP's Corvallis Division and worked on 
chip manufacturing for the HP 41 C and HP 1Q Series 
calcuiators. Jim was horn in San Francisco, is mar- 
ried, and has two children He sits on the board of 
trustees of a nonprofit organization that serves 
disabled children In his free time he enjoys music, 
fishing, and gardening. 

Hor-Kuen Chan 

Hum in Hong Kong, Hoi- 
Kuen Chan earned a BS de- 
gree ]n mathematics with 
honors from the University 
.,.' Singapore in 1979 and 
: ''.yiSCS degree from San 

•■;& State University in 
1993. She joined HP Labora- 
tories in 1984, where she 
was responsible for brjngmg HP HAPP-R6 from a (ab 
prototype to a full-featured product She then trans- 
ferred to HP's Integrated Circuit Business Division to 
participate in the continuous development of HARP-RB, 
In 1393 she began designing and developing the 
global router in the PA„Route system and is currenlly 
focusing on the power gridding and a new timing- 
dnven global router. She is professtonally interested 
in CAD and software engineering and has coauthored 
four papers on floor planning, placement and routing 
for several HP Design Technntogy Conferences, Before 
coming to HP she worked as a software engineer at 
Fa ire hi Id Camera and Instrument Hoi -Kuen ts married 
and has two children Her hobbies include reading, 
cooking, hiking, and aerobo. 

Martirr D, Kruc ken berg 

Martin Kruckenberg is an 
R&D engineer at HP's Inte- 
grated Circuit Business Divi- 
sion and is currently devel- 
oping timing analysis 
meibodologJes and working 
on the new global router in 
^^ ^ the HP PA_Route system. He 

'^'■^"^ / '•'^^ recently implemented the 
systems database and input spigots. Previously he 
implemented the database and input spigots and 
designed detailed routers for PA^Route He received 
BS and MS degrees in computer engineering in 1989 
and 1990 from the University Qi California at Santa 
Crui After graduating he joined HP and worked as an 
application engineer on the HARP project. He is 
professionally interested in deep submrcron timing 
issues and code reuse and automatic generation. He 
has coauthored a paper on block routing in the HP 
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CmSU IC process in his trr - - — ^ '-- - - -- 

acTvlic paimiog anii ^n^f: 

44 HP OpenCatl Technploigy 

Tarek Dehni 

An R&D engineer al HP's 
Telecom Networks Dj vision, 
Tarek Oehni Is involved in 
' ■^;^ ^S^V the envelopment of the HP 
OpefiCall fN platform He 
/ rBcenilv worked an t^e spec- 
ifiCBtion, develDpjnem, and 
tesi of The IN product con- 
nectivity layer for the HP 
OperkCall SS7 TGAP priitocal He represents HP at an 
£TSI Standard I zsti on IM group Born in Dgmescus. 
Syria, Tarek earriEd a B.St, degree in electrical engi- 
neering in 1985 from Damascus Universjty He went 
on to receive an M,Sc degree in optical and iTJicro- 
v^sve telecDmmynicatiDns in 199? from ENSEA m 
France and an M Sc degree in networks and comrny- 
OfCatton systems in 1993 from ENSEBHT *n France. 
Hepiried HP in 1 995 and worked on the service 
switctimg pomt simulator running on ihe HP 9000 
Series 700 compote?, extendincj the code lo suppon 
the ISDM'UP signaling simolatton. In fi»s free time, 
Tarek enjoys going to the cinema, eating good food. 
and plavmg soccer. 

John M. OXonnell 

^^^^ An architect at HP's Telecom 

j^^B^^ Networks Div^siDn, John 
f ^m O'Conneli isfe^pansiblefDi 

IfPjK — ^1 the HP OperiCafi fN pfaJ- 
' forms, iBadmg the technical 

^ investigations on the evolu- 

tion of the pJatforrns white 
offering consultations to 
partners and customers on 
the current versions. Previously he was a technical 
lead responsjble for ihe development of the HF Open- 
Call aurvice logic execution environment, with partic- 
ular emphasis on fault-tolerance aspects Born in 
Trafee, Ireland. John earned a 8Sc degree in mathe- 
matics and computer science in 1986 from the Uni- 
versity College of Corl< in Ireland Afte^ graduating 
he joined HP laboratorfes, where he rnEtialiy rnvesti- 
gateri formal specification languages for concurrent 
and distributed systems He then began working in 
the area ot distributed sy^^ems and conducted a 
stiKiy nf ej(isting distritaotion infrastructures After- 
wards he began to focus his attention on the use oi 
distributer! technology to achieve the high availability 
required for Telecom platforms He ^s profess lanatly 
interested in distrrbuted systems, fault tolerance, and 




'-- '"-^ 'at I on ot tniefnet t&chnology and tefeeom- 
■ .m ^ &ijoys ootdoor spans such as soccer. 
yu • diiii skiing He also enjoys the cinenra, 

Nicolas Ragyrdeau 

Nicolas Baguideau rs a tefe- 
con> consyltant at HP's Tele- 
com Networks Division 
Since 1994 he has been re- 
sponsible for BSD programs 
concerning telecom service 
infrastructure, inclyding e'va- 
luatmg new distributed pna- 
cessmg software technolo- 
gies, prototyping telecom systems, consulting with 
HP sales organizatsons. and collaborating with HP 
LabDiaEones. He recently consulted on the evolution 
of neuvork systems and architecture tn mtelligent 
networks and is currently investigating nevv telecom 
service infrastrjcturesfot molttmedia. He has au- 
thored over ten articles on his work and js named as 
ar\ inventor in two patents on ways to deliver tele- 
com services that combine telecom and Internet 
infrastructures He received m Ingenieur degree m 
computer science and infofmalion networks m I9B9 
froin the Ecafe Supeneure dtlectricit^ m France. 
Before jofnmg HP he worked as a research engineer 
at Nippon Telegraph and Telephone Company and as 
an engineer at Telesystems. Born in Paris. France. 
Nicolas IS married and has two children He served m 
Japan's civil service as a researcher rn optoelectron- 
ics His hobbies fnclude iearnmg Japanese Kanji and 
mountameering 

56 HP OpenCalf SS7 Platform 



Denrs Pierrott 

An R&D engineer at HP's 
Telecom Networks Division.. 
Denis Pierrot recently partic- 
rpated in the design and 
imptementation of the HP 
OpenCall SS7 MTP protocol 
layer and is currently work- 
ing on the HP Open Gal I ser- 
vice execution platform Pre- 
viously he worked on the product's real-time operating 
system for the SCSI protocol board He is piofession- 
ally imeresied m databases and reai-tirne software. 
He Is named as an inventor in a patent involving 
generic fault- tolerant platforms Born in Lyon. France, 
he graduated m 1989 from Ecole Nationale des Poms 
et Chaussees in Paris with an MS degree in computer 
science and from the University Paris I V with a minor 
in artificial mtelhgence Before jommg, HP he worked 
at Mafben Corporation designing and developmg a 




\.j 



transport proiDCol layer for mobile robotics and at 
Maira Esp#ce developing satellite image enploration 
tools for French satellites He jOiffed HF in 1992 and 
WDdtid as an BM) software engineer on data terminal 
controllers Denis is married and has three children 
He served in the French mountain artillery from 1989 
to 1990 His outside interests include maintain hiking 
and swimming, He is also in charge of the squash 
program at hts HP site 

Jean-Pierre Altegre 

a Born m Grenoble, France, 
Jean-Pierre All eg re Qradu- 
atad from Ecole Mationale 
Supeneure d " Electron ique et 
de Radioeiectrcite de Gre- 
noble lENSERG) and from 
Ecole Nationsle Supeneure 
des Telecommunications de 
Pans l£j\iSn, He joined HP 
in 1986 and ^s currently m architect at HF^ Telecom 
Networks Division responsible for the HP DpenGall 
SS7 platform and its evulutmn. Prevmusly he was the 
technical lead on the IN platform node management 
project and before that was the lead on a project that 
developed the OSl transaction processing protocol 
and the OS I common management information proto^ 
col and investigated the TMN architecture. He is pro- 
fessionally interested m distributed systems, fault 
QJerance. and real-time systems He authored an 
article aboui X,2& support for the HP Journal m 199D, 
Jean-Pierre is marned and has three children, In his 
free time he enjoys woodworking and outdoor activi- 
ties such as bicycling, skiing, tennis, and swimming. 

B3 High Availability 



Brian C. Wyfd 

A technical lead at HP's 
Telecom Networks Division, 
Bnan Wyfd was responsible 
for the HP OpenCall SS7 
platform that involved the 
creation of the fault-tolerant 
platform and the systems 
control application and 
^ loW'levef high availability 
libraries. He is currently the lead on the product's 
interface project. wort<inp on the architecture of ttie 
distrrhuted systems platform He is named as an 
ttiventor in two patents involving a failure deteaion 
method m multi route channels and a generic fault 
tolerant platform. He is professionally interested in 
LAN networking and fault tolerani systems end is a 
member of the IEEE He has authored two papers on 
Eecfuncal hints for Windows 3.1 programming, Born 
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in Betlshill, Scoiland, he received an MS degree in 
eiectrical and electronic engineering frnm Herint- 
Watt Unfversitv m Scotland in 19B9, Before jojnsng 
HP in 1993. Jie worked at Spider Sysiems Ltd., work- 
ing on LAN monitDrmg and analysis and on OEM soft- 
ware protocol stacks He also worked at PMS Ltd. on 
home computer products. Bnan is married and has 
one daughter His hobbies include iiouse repairs, gar- 
dening, and skiing. He also drives and repairs classic 
cars and is currently the owner of a 1969 Triumph 
Vitesse Mkll, 

Jean-Pierre Allegre 

Authors biography appears elsewhere in this section 

70 PEasma Mass Spectmmeter 

Yoko Ki^hi 

Yoko Kishi IS a senior ICP- 
MS application chemist at 
Vakogawa Analytical Sys- 
Eems and is responsible for 
ICP-MS eppiEcation market- 
ing She is correntiy doing 
ICP-MS appiicatjon develop- 
ment and research at the 
University o1 Sheffield in the 
United Kingdom and previoosfy worked on the devel- 
opn^ent of the HP 450Q. She received a BS degree in 
environmentat chemistry in 1986 from Keio University 
m Tokyo, Japan, After graduatmg she joined Yokogawa 
Electric Corporation {YHQ| as an application engineer 
in the R&D group end later in the sales group. In 
1992 she moved to Yokogawa Analytical Systems. 
Inc.. the joint venture company being established 
between HP and YHD, artd initially worked as an ap- 
plication engineer in sal&s and later in the customer 
suppod: center. Yoko is a member of the Japan Society 
of Analytical Chemistry. She was born in Tokyo, Japan 
and is married. In her free time she enjoys golf and 
Japanese f fewer arranging called ikebana, tor which 
she has an instructar's license. 
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IB Object DBMS 

Timothy P, loamjs 

^^^^^ Software engineer Tim 

^^^^^^k Loomis is currently a techni- 

^^^^^^m ^^' '^^^ 3^ ^^'^ Chemol 
^Hp^ ic^l Analysis Solutions Division. 
^f f In tfie last nine years he has 

*^ worked an the software de- 

sign and development of 
^^^^ several laboratory in forma- 

^^^* lion and database products, 

mcludjog HP ChemLMS, LAG/UX, ChemlCAL and 
Ch em Study He is professionally interested in infor- 
mation systems design. He received a PhD in geology 
and geoptiysics in 1971 from Princeton University. He 
served on the faculty at Yaie. LfCLA, and the Univer- 
sity of Ariz one. His principal research was in the area 
of computer simulations of nonequilibrium thermo- 
dynamic models of multicomponent diffusion and 
crystal growth in minerals. He has authored over 
thirty papers in the fields of geophysics, geology, 
applied artificiai intelligence methods, and software 
prototyping. He also wrote about the origins of pot- 
tery of prehistoric Indians jo Southern Arizona. After 
leaving the university system and before joining HR 
he did consultmg for two years in the database mod- 
eling field. Tim was bom in California, He has two 
teenage daughters. In his free time, he enjoys vigor- 
ous outdoor activities including sea kayaking, skiing, 
hike touring, backpacking, and trekking in remote 
areas. At the time of this publication, he plans to be 
sea kayaking off Ellesmare Island in the Arctic. 

88 Policing in ATM Networks 



Mohammad Makarechian 

Mohan^mad lUakarechian 
earned a BS degree in com- 
puter engineenng in 1392 
from the University of Al- 
berta. Canada and an MS 
degree in computer engi- 
neering in 1994 from Boston 
Universit/, After graduating 
' he joined HP's Communica- 
tions Measurements Division, where he has worked 
on software development performance analysis, and 
quality assurance. He is currently developing applica- 
tion software for modeler components of the HP 
Broadband Series Test System and recently worked 
on the HP E4223 ATM policing and traffic character- 
ization software. Before that he led the software 
development for the HP £4219 ATM network impair- 
ment emulator and contributed to the HP E4Z09 cell 
protocol processor. Mohammad was born m Tehran, 





Iran. He is a member of the Association for Comput- 
ing Machrnery [ACMfand is involved with a special 
ACM interest group on data communications 

Nicholas J. Malcolm 

A software developer at HPs 
Communications Measure- 
men Es Division since 1994, 
NichoEas Malcolm was the 

\A J technical lead for the devef- 

^i opment of the HP E4223A 

^- ' V ATM policing and traffic 

characterization test applica- 
tion. He is currently develop- 
ing software for an ATM operations and maintenance 
(OAMI tester mode I e in the HP Broadhand Series Test 
System. He JS professionally interested in real-time 
communication, distributed systems, and software 
design and is a member of the IEEE Born in Murray 
Bridge, Australia, Micholas received a B So. degree 
with honors in computer science frofn die University 
of Adelaide m 1989 and an M.Sc degree in computer 
science from the Umversity of Calgary in 1931 . He 
went on to earn a PhD in computer science from Texas 
A8iM University in 1994, 

94 MOSFET ScaMng """^"^^ 



Paul Vande Voorda 

Paul Vande VoonJe received 
a PfiD degree in solid-state 
physics from the University 
of Colorado in 1980. In 19S1 
he joined HP Laboratories, 
working initially \\\ the area 
of inkjet prmthead fabnca- 
lion, and then in the area of 
silicon device fabrication 
and modeling. He has contributed to the development 
of advanced CMOS, bipolar, and BiCMOS processes. 
He was a member of the HP-25 bipolar process de- 
velopment team. His present areas of research are 
process and device simulation as apphed to highly 
scaled CMOS devices, He has authored or coaothored 
fifteen technical papers on silicon processing, pro- 
cess modeling, and device modeling. He is named as 
an inventor in five patents concerning silicon pro- 
cessing and coaothored a textbook called ComputEr- 
Aided Design snd VLSt De\^ic& Devetopment wbich 
was published by Klower Academic Publishers in 
Boston, Massachusetts in 1988 Bom in Chamberlafn, 
South Dakota, Paul served in the U.S. Army from 
1972 to 1394 He is married and has one child. In his^ 
free time he enjoys playing witti his son and going 
hiking. 
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99 Clock FM for EMI Reduction 



CornatrsD. Hoeksira 

Casey Hoekstra is an engi- 
neer at HP's Integrated Cir- 
cuit Business Division arvd is 
currently 3 project leader for 
ASIC deveiopment. He is 
named as a CQinventor irt a 
patent on testing integrated 
circuit pad input and output 
" - ' structures. He received 6 A 

and MS liayree* in physics in 1976 and 1997, respec- 
tively, trath from the University of Oregon, He joined 
the thermal printhead group at HP's Corvallis Site 
Operations in 1 977 and a year later moved to CMOS 
process engineering, where he worked m photo- 
litfiography. yield and reliahility. test, and design en- 
gineering. He coauthored a 1987 HP Journal article 
about the development and application of test hard- 
ware and software for the multtchip hybrid printed 
circuit board used in advanced handheld calculators, 
such as the HP Business Consultant calculatot fiorn 
in Schyndel, The Netherlands, Casey es married ancf 
has TWO daughters. He served in the U.S Army as a 
medic from 1970 to 197X His outside interests in- 
clude working around the house, gardening, hiking, 
camping, and climbing. 

1 05 H DL M ic loprocessor Po rting 

Jim J. LJn 

Jim Lin received a BS degree 
in electrical and computer 
engineering m 1994 from 
Carnegie Mellon Umversrry 
and an MSEE degree in 
1 998 from Stanford Univer- 
sity. He worked as a summer 
intern at HP's Integrated 
Circuit Business Division in 
1933 and began working full-time as a design engtneer 
m 1994. implementing embedded microprocessors 
and integrating them on a single chip with other ASIC 
functionality, For the project described in this issue 
he worked on modifying the HOL for the cache con- 
troller and on synthesis, simulation, and verification. 
He is currently responsible for designing several con- 
trol paths Jar the nert-generation processor and for 
Its overall simulation and synthesis strategy. He is 
professionally interested in processor microarchitec- 
ture modeting and synthesis. Bom in Shanghai. China. 





he is merrfed and enjoys outdoor arfivities and sports 
such as camping, hiking, soccer, and t^askettjall He is 
a big sports fan and would Itive to try his hand at 

sportscasTing 

112 3V CMOS Op orati on a 1 Am pi tf te r 

Derek L Knee 

Derek Knee is a technical 
contributor and design engi- 
neer at HP's Integrated Cir- 
cuit Business Division He 
recently designed the gener- 
al-purpose 3V CMOS opera- 
tional amplifier described in 
this issue and is currently 
desigrimg the analog front 
end for an optica i position encoder for a handheld 
scanner He is named as an inventor in three patents 
concerntng programmable integrated circuits. RF 
emissions, artd fully diflerential flash ADCs. He is a 
memher of the IEEE and is professionally interested 
m analog design methodologies. He received BSEE 
and MSEE degrees in 1979 and 1981. respect ively^ 
from the University of Natal in South Africa, Before 
joining HP he worked at Exar Corporation designing 
analog bipolar and CMOS ASICs and at Samsung 
Semiconductor designing high-speed PRML disk drive 
channels using BiCMOS technology Since coming 
to HP in 1987 he has also designed CM0S14 analog 
cells, a servo controller, and other circuits. Born in 
Pinetown, Natal. South Africa. Derek served m the 
South African military in 1975. Married, he is a tri- 
athelete and has been a drummer since the age of 9, 

Charles E. Moore 

A technical contributor at 
HP's Integrated Circuit Busi- 
ness Diviston. Charles 
Moore is curremfy the tech- 
11 otI lead on an optical posi- 
' ■'.encoder chip and is 
!■ -My consultations on an- 
:"ii:r chip with optical con- 
tent He has worked with HP 
for ; . . -.amed as ao inventor in over 

fifteen patents on lens design, IC design, and the 
system design of instruments. He has authored sev- 
eral articles in the HP Journal on his work. He is a 
member ot the Optical Society of America and is a 
past president of the Rocky Mountain chapter. He is 




profess ionafly interested in analog !C destgn. opflics. 
and system design He received a BSEE degree from 

the University of California at Beriteley in 1966 and 
an MS degree \n optics from the University of Roch- 
ester in New York in 1978. His HP projects include 
working as a product and process engineer on a sili- 
con thermal rms converter and working on the opti- 
cal, system, and receiver electronics design for the 
HP 3920 surveying total station, Born in Safita Fe. 
New Mexico, he served in the U.S. Army from 136C1 
to 1963. Charles is marrted and has five children and 
one grandchild. He has done volunteer work with the 
Democratic party and his hobbies include playing 
chess, directing chess tournaments, and studying the 
history of technology. 

1 20 Keat Transfer from a Flip-Chip 
Package 

Cyllon £. Bash 

A product development engi- 
neer at HP's Network Server 
Division, Cullen Bash is cur- 
rently working on the thermal 
and mechanical design of 
theHPNetServer lineoff 
network servers. CuNen re- 
ceived BS and MS degrees 
in mechamcal engineering in 
1994 and 1995, respectively, from the University of 
California at San Oiego. He then joined HP's Systems 
Technology Division, For the HP 9000 Model T6D0 
corporate business server, he was responsible for the 
thermal design of the memory board assembly and 
for the thermal and mechanicaf design of the 1/0 
assembly (HSC bus converter). 

i^Fchsril L Blinco 

Rich Stanco IS an REiD pro|- 
Ect manager at HP's Systems 
Technology Division. He 
joined HP in 19B4 after re- 
ceiving a BS degree in fluids 
and thermal science from 
Case Western Reserve Uni- 
versity. In 1994 he earned an 
MS degree in mechanical 
engineering from Stanford University, Rich is married 
and has two children In his free time, he enjoys 
spending time with his family. especFahy playing 
soccer and going camping. 
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Reader Forum 



arrd witi pubiiish iettefs e^pecled Id b^ of mteresr to aur readers, Lsrtefs must be bntf and 
are' lubieato fidLttr>g. Letts ri should be addressed to- 

EdElnr, HBwlfltt-PackBrd Journal 
30QS Hafi&ver Street, 20 BH 
PalD Aitfl.CA 94304. U.SJ\, 



Editor: 

In the reL-eiU anU'le etiTilleci "Tlu^ Gkibal Poi^ititjniug Sysicin 
aiKl HP Siiiaitt'lock'' by -loim A. Kiisters. which appeared in 
thf De*ceiiihE*r ll^9t> Issue of the ilewiett-Parkaid Joiimal. die 
iTeiatioEiship of tilobal Posiiiomng System (GPS) time to Coor- 
dinated Universal Ttme (I'TC) was inadequately explained. 

1 . ITC" is a linie st file juaintaJned ufficiajly by thf* BIPM 
(Bureau Inlemaiional de,s Poki-s f^r Mestires) using clocks 
from a nntiiher of Uils oratories around the vvf>rlci l.TC is 
avaiiaJile to users with a delay of up to two months he- 
cause of thf! need to analyze the eontrihutecl clock data 
very carefully. It is a(^ listed occasionally by one second 
ao that tlie absolute vuliie of the difference l>etweeii UTC 
and the astronomical Lime scale (I Tl) does tiol exi"eed 
0,9 second, TlieKe one-secotid a^liiistnients, calked imp 
Sf^rotitls. are not made m (iPS time. M a lesultt (iPB rime 
will he fhfferE^nt fri>m l TC by an inte^rai number of sec- 
onds plus a small synchronization error As of July 1, 1997, 
the difference is essentially 12 isetond^. 

2. Similarly, the U,S. Naval Observatory fUSNO) maintains a 
time scale that, by international agreement, is steered lo 
be near 1T€'. Approx in lately forty HP 5* 371 A prijnmy fre- 
qut*nc:y ^^tandards and ten hydiogen niasers are used to 
accomplish this. The t -SNO master clock pi'o\ides a real- 
time realization, IJTC(USN0 MCJ, of the t alculated USNO 
lmtes<.'ale. iJver the p^isl year, the difference between 
ITC and UTC(USNO iVIC\) has not exceeded thirty nano- 
seconds, 

a Tin- tiinijig of the GPS system (GPS timt) is niaintidned 
by the (iPS 4Master Control Station (MC S) at Falcon AFB, 
Colorado Springs, CO. using obseived time differences 



between the VSNO Master Cloik atu! tlie (iPS lime broad- 
cast by the satellites in the GPS system. This data is col- 
lecMed by the Naval Observatory and made available lo 
tliP MC'S daily Using this inlbrmatjon^ <iPS time is steered 
toward UTCfUSNO MC) by M(S personnel. 

4, Tlie GPS systetn nor only luai mains CtPS time, it also pro 
videsi in formation in the tiavigatioti message broadca**! by 
each satellite ic^ i*nai>le ti^e user' to extmct a repn^eniatiot^ 
ofllTC^USNO Mt.'J. Currentiy the rms difference between 
UTC'CUSNp MC) and the UTC available from the satellites 
is in the neighborhood of 8 nanosecon<ls By agieemeritf 
GPS time is to be maintained within 1 mi( rosecond of 
UTC(USNO MC) absent the integral setf^nds. arul the 
representation of UTCfLISNO MC J available from the GPS 
system is lo \w kept accurate will i in HK) nanaseconds. 

5. To satisfy the many users of precise time and Irequency, 
the Naval Obsenatory continues to maintain its time 
scales cis close as possible to that provided by BIPM. 
These users include [but aie not limited to) users of tlie 
GPS system. No plans to cliange the agreements men- 
tioned above are in place or conlemplated. 

Dennis D. McCarthy 
Director^ EJirectorate of Time 
U.S, Naval C>bser\ atory 

John A. Kusters 
Principal Scientist 
HewletUPackanl Company 
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